Hi,
I am fine tuning BERT for my own data set. Pregenerate training data was smooth but when I run finetune_on_pregenerated.py I got the following KeyError:
2019-07-11 22:53:04,151: ***** Running training *****
2019-07-11 22:53:04,151: Num examples = 35832
2019-07-11 22:53:04,151: Batch size = 32
2019-07-11 22:53:04,152: Num steps = 1119
2019-07-11 22:53:04,156: Loading training examples for epoch 0
Training examples: 0%| | 0/12078 [00:00<?, ?it/s]
Traceback (most recent call last):
File "finetune-hugging.py", line 348, in
main()
File "finetune-hugging.py", line 297, in main
num_data_epochs=num_data_epochs, reduce_memory=args.reduce_memory)
File "finetune-hugging.py", line 105, in init
features = convert_example_to_features(example, tokenizer, seq_len)
File "finetune-hugging.py", line 43, in convert_example_to_features
input_ids = tokenizer.convert_tokens_to_ids(tokens)
File "/anaconda3/lib/python3.7/site-packages/pytorch_pretrained_bert/tokenization.py", line 121, in convert_tokens_to_ids
ids.append(self.vocab[token])
KeyError: 'Ad'
Out[21]: 256
I could really use some help from you guys. Many Thanks!
Hi,
I am fine tuning BERT for my own data set. Pregenerate training data was smooth but when I run finetune_on_pregenerated.py I got the following KeyError:
2019-07-11 22:53:04,151: ***** Running training *****
2019-07-11 22:53:04,151: Num examples = 35832
2019-07-11 22:53:04,151: Batch size = 32
2019-07-11 22:53:04,152: Num steps = 1119
2019-07-11 22:53:04,156: Loading training examples for epoch 0
Training examples: 0%| | 0/12078 [00:00<?, ?it/s]
Traceback (most recent call last):
File "finetune-hugging.py", line 348, in
main()
File "finetune-hugging.py", line 297, in main
num_data_epochs=num_data_epochs, reduce_memory=args.reduce_memory)
File "finetune-hugging.py", line 105, in init
features = convert_example_to_features(example, tokenizer, seq_len)
File "finetune-hugging.py", line 43, in convert_example_to_features
input_ids = tokenizer.convert_tokens_to_ids(tokens)
File "/anaconda3/lib/python3.7/site-packages/pytorch_pretrained_bert/tokenization.py", line 121, in convert_tokens_to_ids
ids.append(self.vocab[token])
KeyError: 'Ad'
Out[21]: 256
I could really use some help from you guys. Many Thanks!