Skip to content

depreciating all occurances of clean_up_tokenization_spaces#31232

Closed
itazap wants to merge 2 commits intomainfrom
31187_depreciate_clean_up_tokenization_spaced
Closed

depreciating all occurances of clean_up_tokenization_spaces#31232
itazap wants to merge 2 commits intomainfrom
31187_depreciate_clean_up_tokenization_spaced

Conversation

@itazap
Copy link
Copy Markdown
Collaborator

@itazap itazap commented Jun 4, 2024

fixes #31187
depreciating all occurances of clean_up_tokenization_spaces

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jul 5, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions Bot closed this Jul 14, 2024
@seastar105
Copy link
Copy Markdown

@itazap Any update on this PR?

@itazap
Copy link
Copy Markdown
Collaborator Author

itazap commented Nov 22, 2024

@seastar105 yes, we began the deprecation with setting clean_up_tokenization_space=False by default (#31938). For now, there is still a significant amount of older models that depend on this clean_up_tokenization_space=True, but by default, your model won't be using it anymore 😊

@itazap itazap deleted the 31187_depreciate_clean_up_tokenization_spaced branch April 24, 2025 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Original Llama-3 tokenizer behaves differently from transformers version

3 participants