The Tokenizer takes the token ‘pricerange’ as ‘[UNK]’?

For example:
tokens:
['i', 'am', 'looking', 'for', 'a', 'restaurant', 'in', 'the', '[restaurant_area]', '.', 'postcode', 'type', 'phone', 'food', 'pricerange', 'address', 'area', 'name', 'id', 'reference']

input_ids:
 [8, 35, 51, 15, 12, 45, 18, 9, 67, 6, 89, 117, 68, 88, 3, 82, 70, 346, 281, 49, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

tokenizer.convert_id_to_tokens(input_ids)):
i am looking for a restaurant in the [restaurant_area] . postcode type phone food [UNK] address area name id reference.


The Tokenizer takes the token ‘pricerange’ as ‘[UNK]’, so the training code might not work.
Is it normal？Does the source code has something incorrect?
I try to examine this issue by:
tokenizer = Tokenizer(vocab, ivocab, False)
print(tokenizer.vocab_len) # 3130
print(tokenizer.get_word_id('pricerange')) # 3
print(tokenizer.get_word(3)) # [UNK]



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Tokenizer takes the token ‘pricerange’ as ‘[UNK]’? #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

The Tokenizer takes the token ‘pricerange’ as ‘[UNK]’? #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions