Fix too many requests in TestMistralCommonTokenizer#40623
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
94f6cc4 to
f8fce12
Compare
vasqu
left a comment
There was a problem hiding this comment.
LGTM overall, just to be sure iiuc this is for local caching before starting the tests themself
| cls.repo_id, | ||
| tokenizer_type="mistral", | ||
| local_files_only=cls.local_files_only, | ||
| # This is a hack as `list_local_hf_repo_files` from `mistral_common` has a bug |
There was a problem hiding this comment.
Maybe even a TODO? Imo not a good state to need this workaround 😓
There was a problem hiding this comment.
Added:
TODO: Discuss with
mistral-commonmaintainers: after a fix being done there, remove thisrevisionhack
| if is_mistral_common_available(): | ||
| from mistral_common.tokens.tokenizers.mistral import MistralTokenizer |
There was a problem hiding this comment.
yeah, I was tortured by some mistral-common issues and at the end my brain 😵💫😵💫😵💫
| # For `tests/test_tokenization_mistral_common.py:TestMistralCommonTokenizer`, which eventually calls | ||
| # `mistral_common.tokens.tokenizers.utils.download_tokenizer_from_hf_hub` which (probably) doesn't have the cache. | ||
| if is_mistral_common_available(): | ||
| from mistral_common.tokens.tokenizers.mistral import MistralTokenizer | ||
|
|
||
| from transformers import AutoTokenizer | ||
| from transformers.tokenization_mistral_common import MistralCommonTokenizer | ||
|
|
||
| repo_id = "hf-internal-testing/namespace-mistralai-repo_name-Mistral-Small-3.1-24B-Instruct-2503" | ||
| AutoTokenizer.from_pretrained(repo_id, tokenizer_type="mistral") | ||
| MistralCommonTokenizer.from_pretrained(repo_id) | ||
| MistralTokenizer.from_hf_hub(repo_id) | ||
|
|
||
| repo_id = "mistralai/Voxtral-Mini-3B-2507" | ||
| AutoTokenizer.from_pretrained(repo_id) | ||
| MistralTokenizer.from_hf_hub(repo_id) |
There was a problem hiding this comment.
Not sure how bloated this file will become but might be nice to split into different functions already?
There was a problem hiding this comment.
let's do something later. For now, the most important is just to git rid of these annoying connection errorrrrrrrrrs!
What does this PR do?
We have for this test the following error (flaky)
This PR just try to cache the tokenizers in a previous step before tests being run
.
mistral_commonhas a bug and make things a bit hard to handle in a clean way that I have to do a hack withrevision=None.https://app.circleci.com/pipelines/github/huggingface/transformers/144453/workflows/8e42765f-71d4-4733-9574-feb160ff8eda/jobs/1910963/parallel-runs/6/steps/6-113