Skip to content

[ignore_merges] Fix offsets#1640

Merged
ArthurZucker merged 3 commits intomainfrom
fix-offsets
Oct 1, 2024
Merged

[ignore_merges] Fix offsets#1640
ArthurZucker merged 3 commits intomainfrom
fix-offsets

Conversation

@ArthurZucker
Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker commented Sep 30, 2024

>>> import tokenizers
>>> from tokenizers import Tokenizer
>>> t = Tokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
>>> t.encode("are you ok").offsets
[(0, 0), (0, 3), (3, 7), (7, 10)]

fixes #1553, fixes #1517, fixes huggingface/transformers#33675, fixes #1620

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@thepowerfuldeez
Copy link
Copy Markdown

Hi @ArthurZucker can you push a hotfix to pip as a 0.20.1 version? Seems like installing from source is complicated for tokenizers :)

@ArthurZucker
Copy link
Copy Markdown
Collaborator Author

On it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants