Refactoring old run_swag.py#1004
Conversation
…uad in pytorch_transformers
Codecov Report
@@ Coverage Diff @@
## master #1004 +/- ##
=========================================
- Coverage 81.16% 80.77% -0.4%
=========================================
Files 57 57
Lines 8039 8092 +53
=========================================
+ Hits 6525 6536 +11
- Misses 1514 1556 +42
Continue to review full report at Codecov.
|
merge huggingface/master to update
roberta, xlnet for multiple choice
|
run_multiple_choice.py and utils_multiple_choice.py with roberta and xlnet have been tested on RACE, SWAG, ARC Challenge.
|
|
This looks really great. Thanks for updating and testing this script @erenup A few questions and remarks:
|
|
@thomwolf Thank you!
|
# Conflicts: # pytorch_transformers/__init__.py
Run multiple choice add doc
|
Hi @thomwolf, Docstrings of the multiple-choice models have been added. An example of run_multiple_choice.py has been added in the README of examples. Thank you. |
|
|
||
| tr_loss += loss.item() | ||
| if (step + 1) % args.gradient_accumulation_steps == 0: | ||
| scheduler.step() # Update learning rate schedule |
There was a problem hiding this comment.
PyTorch scheduler.step() should be called after optimizer.step() (see pytorch/pytorch#20124)
|
Ok this looks clean and almost ready to merge, just added a quick comment to fix in the code (order of calls to step). A few things for the merge as we have re-organized the examples folder, can you:
|
…hoice_merge # Conflicts: # examples/contrib/run_swag.py
# Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.
Run multiple choice merge
|
Hi @thomwolf. I have moved run_multiple_choice.py and utils_multiple_choice.py to examples, run_swag.py to example/contrib and scheduler.step after optimizer.step. I have also done a test of the example/contrib/run_swag.py on current pytorch-transformers. run_swag.py can get a normal result of dev 0.809 of bert-base-uncased model. Thank you. |
|
Awesome, thanks a lot for this contribution @erenup 🔥 |
Could you share your run -configuration on RACE and ARC dataset? line 638, in create_examples KeyError: 'para' |
|
Hi, @PantherYan For ARC, you need to ask ai2 for the retrieved text named |
Thanks a lot for your prompt reply! Appreciate! For the ARC. Thanks, I have write a email to AI2 for the help. Thank you! |
Thank you for your sharing your training configuration to guid us. I used the pytorch backend, and strictly following your configure setting, except roberta-base and the batch_size= 2(per_gpu_train_batch_size)*4(gpu_num) , which you set [ train_batch_size=8]. In other words, you setting batch_size = 8, and my setting batch_size =2.
data/nlp/MCQA/RACE/cached_test_roberta-base_384_race
11/01/2019 00:31:22 - INFO - transformers.configuration_utils - Configuration saved in models_race/roberta-base/checkpoint-12000/config.json @erenup Could I learn your training loss and test loss after 5 epochs? |
|
Hi @PantherYan I did not run race dataset with roberta base. In my experience, I thought the results of RACE with roberta base make sense, Since Bert large can only reach about 71~72. You can check the leaderboard for reference. |
@erenup |
|
@erenup |
I also met the problem of missing item "para", have you got some methods for converting raw corpus? |
|
Please see PatherYan's comments and mine |
Pytorch-transformers! Nice work!
Refactoring old run_swag.py.
Motivation:
I have seen the swag PR1 #951 and related issues #931
According to @thomwolf 's comments on PR1, I think it's necessary to adopt code styles of run_squad.py in run_swag.py so that we can easily take advantage of the new powerful pytorch_transformers.
Changes:
I refactored the old run_swag.py following run_squad.py and tested it on bert_base_uncased pretrained model, on Tesla P100.
Tests:
export SWAG_DIR=/path/to/SWAG python -m torch.distributed.launch --nproc_per_node 1 run_swag.py \ --train_file SWAG_DIR/train.csv \ --predict_file SWAG_DIR/val.csv \ --model_type bert \ --model_name_or_path bert-base-uncased \ --max_seq_length 80 \ --do_train \ --do_eval \ --do_lower_case \ --output_dir ../models/swag_output \ --per_gpu_train_batch_size 32 \ --per_gpu_eval_batch_size 32 \ --learning_rate 2e-5 \ --gradient_accumulation_steps 2 \ --num_train_epochs 3.0 \ --logging_steps 200 \ --save_steps 200Results:
I have also tested the
--fp16and the acc is 0.801.Other args have been tested:
--evaluate_during_training,--eval_all_checkpoints,--overwrite_output_dir, `--overwrite_cache``.Things have not been tested: multi-gpu, distributed trianing. since I only have one gpu and one computer.
Questions:
It seems the performance is worse than the pytorch-pretrain-bert results. Is this gap of result normal (0.82 and 0.86)?
Future work:
I think it's good to add multiple choice model in XLnet since there are many multiple choice datasets such as RACE.
Thank you all!