Skip to content

Add edge case tests for out-of-range token id decoding in Qwen2 tokenizer#45191

Closed
saslifat-gif wants to merge 2 commits intohuggingface:mainfrom
saslifat-gif:test/qwen2-decode-out-of-range-ids
Closed

Add edge case tests for out-of-range token id decoding in Qwen2 tokenizer#45191
saslifat-gif wants to merge 2 commits intohuggingface:mainfrom
saslifat-gif:test/qwen2-decode-out-of-range-ids

Conversation

@saslifat-gif
Copy link
Copy Markdown
Contributor

The Qwen2 tokenizer test file had no custom test methods — only integration
constants inherited from TokenizerTesterMixin.

This PR adds a test documenting two untested edge cases in decode():

Before (no test, behavior undocumented):

tok.decode([999999])  # silently returns ''
tok.decode([-1])      # raises: out of range integral type conversion attempted

After (behavior documented and regression-protected):

  • Token ids beyond len(tokenizer) (e.g. 999999) silently return an empty
    string — tested with assertEqual(decoded, "")
  • Negative token ids raise an exception — tested with assertRaises(Exception)

Both behaviors verified consistent between Qwen2Tokenizer (slow) and fast
tokenizer on Qwen/Qwen2.5-VL-7B-Instruct.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 2, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen2

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 2, 2026

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45191&sha=81e61e

@Rocketknight1
Copy link
Copy Markdown
Member

Hi @saslifat-gif, we'd prefer no small PRs right now that aren't significant bugfixes or feature additions! We're trying to cut down on the overall volume because of agent spam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants