Skip to content

run_swag.py should use AdamW#951

Closed
jeff-da wants to merge 1 commit intohuggingface:masterfrom
octouw:master
Closed

run_swag.py should use AdamW#951
jeff-da wants to merge 1 commit intohuggingface:masterfrom
octouw:master

Conversation

@jeff-da
Copy link
Copy Markdown

@jeff-da jeff-da commented Aug 2, 2019

run_swag.py doesn't compile currently, BertAdam is removed (per readme).

@codecov-io
Copy link
Copy Markdown

Codecov Report

Merging #951 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #951   +/-   ##
=======================================
  Coverage   79.04%   79.04%           
=======================================
  Files          34       34           
  Lines        6242     6242           
=======================================
  Hits         4934     4934           
  Misses       1308     1308

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 44dd941...a5e7d11. Read the comment docs.

1 similar comment
@codecov-io
Copy link
Copy Markdown

codecov-io commented Aug 2, 2019

Codecov Report

Merging #951 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #951   +/-   ##
=======================================
  Coverage   79.04%   79.04%           
=======================================
  Files          34       34           
  Lines        6242     6242           
=======================================
  Hits         4934     4934           
  Misses       1308     1308

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 44dd941...a5e7d11. Read the comment docs.

@@ -467,10 +467,13 @@ def main():
if (step + 1) % args.gradient_accumulation_steps == 0:
if args.fp16:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this seperate adjustement of learning rate for fp16 anymore with the schedulers

warmup=args.warmup_proportion,
t_total=num_train_optimization_steps)
optimizer = AdamW(optimizer_grouped_parameters, lr=args.learning_rate)
scheduler = WarmupLinearSchedule(optimizer,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scheduler should be created also if we are in fp16.
The fp16 optimizer should now be created like in the run_glue example where there is no distinction between fp16 and normal operation.

@thomwolf
Copy link
Copy Markdown
Member

thomwolf commented Aug 7, 2019

Added a few comments. If you take a look at the run_glue and run_squad examples, you'll see they are much simpler now in term of optimizer setup. This example could take advantage of the same refactoring if you want to give it a look!

@thomwolf
Copy link
Copy Markdown
Member

Thanks for this @jeff-da, we'll close this PR in favor of #1004 for now.
Feel free to re-open if there are other things you would like to change.

@thomwolf thomwolf closed this Aug 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants