Feature/integrations docs fix#44369
Open
leaderofARS wants to merge 7 commits intohuggingface:mainfrom
Open
Conversation
Remove confusing references to TikToken in MistralConverter documentation. Tekken is Mistral's proprietary tokenizer format, not related to TikToken.
- Remove contradictory deprecation notice from docstring - Simplify documentation to focus on actual functionality (block masking for causal and non-causal patterns) - Add note about planned rename to to align with existing TODO The function is actively used and not deprecated; only the name needs clarification.
Member
|
cc @stevhliu - this feels like a code agent PR, but I think some of the doc inconsistencies might be legitimate. |
stevhliu
reviewed
Mar 2, 2026
Member
stevhliu
left a comment
There was a problem hiding this comment.
i would only update the mismatching tokenizer docstrings
| is_causal: bool | None = True, | ||
| ) -> "BlockMask": | ||
| """ | ||
| IMPORTANT NOTICE: This function is deprecated in favor of using the mask primitives in `masking_utils.py`, |
Member
There was a problem hiding this comment.
since this is kept for backward compatibility, i don't think we need to remove the notice here because users shouldn't be using it
Member
There was a problem hiding this comment.
this still needs to be addressed :)
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Contributor
Author
Yeah, I used AI for detecting inconsistency in the doc strings, and changed according the advice i received checked the other files for seeing the legitimacy |
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
This was referenced Apr 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix documentation inconsistencies in integrations folder
Description
This PR addresses documentation errors and inconsistencies across the integrations module, specifically clarifying terminology and deprecation status in two key integration files.
Changes
1. mistral.py - Clarify Mistral Tekken vs TikToken terminology
Issue: The
MistralConverterclass andconvert_tekken_tokenizerfunction had misleading docstrings referring to "tiktoken converter", which conflates Mistral's proprietary Tekken tokenizer format with OpenAI's TikToken. This creates confusion for users and contributors.Fix:
MistralConverterclass docstring to correctly identify it as a converter for "Mistral's Tekken tokenizer format"convert_tekken_tokenizerfunction docstring and inline comments to clarify that Tekken is Mistral's proprietary format2. flex_attention.py - Resolve conflicting deprecation notice
Issue: The
make_flex_block_causal_maskfunction had a verbose "IMPORTANT NOTICE" claiming deprecation in favor ofmasking_utils.py, yet the function is actively used throughout the codebase with a TODO noting the need to rename it (not remove it). This created contradictory documentation.Fix:
make_flex_block_maskto align with the existing TODO comment at line 108Type of Change
Testing
Manual documentation review across all 42 integration files performed. Code style checks passed with
make style,andmake check-repo.Related Issues
Addresses documentation clarity in the integrations module as part of ongoing code quality improvements.