Conversation
Codecov Report
@@ Coverage Diff @@
## master #951 +/- ##
=======================================
Coverage 79.04% 79.04%
=======================================
Files 34 34
Lines 6242 6242
=======================================
Hits 4934 4934
Misses 1308 1308Continue to review full report at Codecov.
|
1 similar comment
Codecov Report
@@ Coverage Diff @@
## master #951 +/- ##
=======================================
Coverage 79.04% 79.04%
=======================================
Files 34 34
Lines 6242 6242
=======================================
Hits 4934 4934
Misses 1308 1308Continue to review full report at Codecov.
|
| @@ -467,10 +467,13 @@ def main(): | |||
| if (step + 1) % args.gradient_accumulation_steps == 0: | |||
| if args.fp16: | |||
There was a problem hiding this comment.
We don't need this seperate adjustement of learning rate for fp16 anymore with the schedulers
| warmup=args.warmup_proportion, | ||
| t_total=num_train_optimization_steps) | ||
| optimizer = AdamW(optimizer_grouped_parameters, lr=args.learning_rate) | ||
| scheduler = WarmupLinearSchedule(optimizer, |
There was a problem hiding this comment.
The scheduler should be created also if we are in fp16.
The fp16 optimizer should now be created like in the run_glue example where there is no distinction between fp16 and normal operation.
|
Added a few comments. If you take a look at the |
run_swag.py doesn't compile currently, BertAdam is removed (per readme).