Skip to content

Fail to run finetune_on_pregenerated.py #780

@allisonyw

Description

@allisonyw

Hi,

I am fine tuning BERT for my own data set. Pregenerate training data was smooth but when I run finetune_on_pregenerated.py I got the following KeyError:

2019-07-11 22:53:04,151: ***** Running training *****
2019-07-11 22:53:04,151: Num examples = 35832
2019-07-11 22:53:04,151: Batch size = 32
2019-07-11 22:53:04,152: Num steps = 1119
2019-07-11 22:53:04,156: Loading training examples for epoch 0
Training examples: 0%| | 0/12078 [00:00<?, ?it/s]
Traceback (most recent call last):
File "finetune-hugging.py", line 348, in
main()
File "finetune-hugging.py", line 297, in main
num_data_epochs=num_data_epochs, reduce_memory=args.reduce_memory)
File "finetune-hugging.py", line 105, in init
features = convert_example_to_features(example, tokenizer, seq_len)
File "finetune-hugging.py", line 43, in convert_example_to_features
input_ids = tokenizer.convert_tokens_to_ids(tokens)
File "/anaconda3/lib/python3.7/site-packages/pytorch_pretrained_bert/tokenization.py", line 121, in convert_tokens_to_ids
ids.append(self.vocab[token])
KeyError: 'Ad'
Out[21]: 256

I could really use some help from you guys. Many Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions