Skip to content

Feature/integrations docs fix#44369

Open
leaderofARS wants to merge 7 commits intohuggingface:mainfrom
leaderofARS:feature/integrations-docs-fix
Open

Feature/integrations docs fix#44369
leaderofARS wants to merge 7 commits intohuggingface:mainfrom
leaderofARS:feature/integrations-docs-fix

Conversation

@leaderofARS
Copy link
Copy Markdown
Contributor

@leaderofARS leaderofARS commented Mar 1, 2026

Fix documentation inconsistencies in integrations folder

Description

This PR addresses documentation errors and inconsistencies across the integrations module, specifically clarifying terminology and deprecation status in two key integration files.

Changes

1. mistral.py - Clarify Mistral Tekken vs TikToken terminology

Issue: The MistralConverter class and convert_tekken_tokenizer function had misleading docstrings referring to "tiktoken converter", which conflates Mistral's proprietary Tekken tokenizer format with OpenAI's TikToken. This creates confusion for users and contributors.

Fix:

  • Updated MistralConverter class docstring to correctly identify it as a converter for "Mistral's Tekken tokenizer format"
  • Updated convert_tekken_tokenizer function docstring and inline comments to clarify that Tekken is Mistral's proprietary format
  • Removed all references to TikToken from these docstrings

2. flex_attention.py - Resolve conflicting deprecation notice

Issue: The make_flex_block_causal_mask function had a verbose "IMPORTANT NOTICE" claiming deprecation in favor of masking_utils.py, yet the function is actively used throughout the codebase with a TODO noting the need to rename it (not remove it). This created contradictory documentation.

Fix:

  • Removed the misleading deprecation notice from the docstring
  • Simplified the docstring to focus on actual functionality (block masking for both causal and non-causal patterns)
  • Added a concise note about the planned rename to make_flex_block_mask to align with the existing TODO comment at line 108
  • The function remains active and supported; only the name requires future clarification

Type of Change

  • Bug fix (non-breaking change which fixes documentation)
  • New feature
  • Breaking change
  • Documentation

Testing

Manual documentation review across all 42 integration files performed. Code style checks passed with make style ,and make check-repo.

Related Issues

Addresses documentation clarity in the integrations module as part of ongoing code quality improvements.

Remove confusing references to TikToken in MistralConverter documentation.
Tekken is Mistral's proprietary tokenizer format, not related to TikToken.
- Remove contradictory deprecation notice from  docstring
- Simplify documentation to focus on actual functionality (block masking for causal and non-causal patterns)
- Add note about planned rename to  to align with existing TODO

The function is actively used and not deprecated; only the name needs clarification.
@Rocketknight1
Copy link
Copy Markdown
Member

cc @stevhliu - this feels like a code agent PR, but I think some of the doc inconsistencies might be legitimate.

Copy link
Copy Markdown
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would only update the mismatching tokenizer docstrings

Comment thread src/transformers/integrations/mistral.py Outdated
Comment thread src/transformers/integrations/mistral.py Outdated
Comment thread src/transformers/integrations/mistral.py Outdated
is_causal: bool | None = True,
) -> "BlockMask":
"""
IMPORTANT NOTICE: This function is deprecated in favor of using the mask primitives in `masking_utils.py`,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is kept for backward compatibility, i don't think we need to remove the notice here because users shouldn't be using it

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should undo the changes here

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this still needs to be addressed :)

leaderofARS and others added 2 commits March 3, 2026 11:00
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
@leaderofARS
Copy link
Copy Markdown
Contributor Author

cc @stevhliu - this feels like a code agent PR, but I think some of the doc inconsistencies might be legitimate.

Yeah, I used AI for detecting inconsistency in the doc strings, and changed according the advice i received checked the other files for seeing the legitimacy

leaderofARS and others added 2 commits March 4, 2026 15:04
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants