Fix: Enable prefill phase key value caching of nemotron/minitron models#34742
Fix: Enable prefill phase key value caching of nemotron/minitron models#34742zucchini-nlp merged 9 commits intohuggingface:mainfrom
Conversation
Signed-off-by: jeongin601 <0200angela@gmail.com>
Signed-off-by: jeongin601 <0200angela@gmail.com>
Signed-off-by: jeongin601 <0200angela@gmail.com>
Signed-off-by: jeongin601 <0200angela@gmail.com>
zucchini-nlp
left a comment
There was a problem hiding this comment.
Thanks for adding this! Let's remove the deprecation warning, otherwise LGTM!
| logger.warning_once( | ||
| "We detected that you are passing `past_key_values` as a tuple of tuples. This is deprecated and " | ||
| "will be removed in v4.47. Please convert your cache or use an appropriate `Cache` class " | ||
| "(https://huggingface.co/docs/transformers/kv_cache#legacy-cache-format)" | ||
| ) |
There was a problem hiding this comment.
I don't think we need to add deprecation message for newly added models, we can support only new Cache objects
There was a problem hiding this comment.
Thanks for reviewing my code! I removed the deprecation warning. :)
There was a problem hiding this comment.
sorry if I wasn't clear, I mean totally removing support for the tuple format and this the from_legacy_cache
I'll ping the core maintainer after that for the final review :)
There was a problem hiding this comment.
Oh, sorry I got it wrong. Now, I removed support for tuple shaped past_key_values. Is this what you meant?
Signed-off-by: jeongin601 <0200angela@gmail.com>
Signed-off-by: jeongin601 <0200angela@gmail.com>
ArthurZucker
left a comment
There was a problem hiding this comment.
One suggestion and we can merge!
|
BTW seem to be related to #34274 |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
I updated it! :) Thanks |
|
Hi @jeongin601, thank you for this PR ❤️ ! It seems this PR has some regression: below is the 3 tests failing. Would you be up to take a look? You can find the job run log here. In any case, thank you in advance. "nemotron": {
"single-gpu": [
{
"test": "tests/models/nemotron/test_modeling_nemotron.py::NemotronModelTest::test_torchscript_output_attentions",
"commit": "318fe25f22a99ce1226f8d2aadc268b40f7e55af",
"pr_number": 34742,
"author": "jeongin601",
"merged_by": "zucchini-nlp"
},
{
"test": "tests/models/nemotron/test_modeling_nemotron.py::NemotronModelTest::test_torchscript_output_hidden_state",
"commit": "318fe25f22a99ce1226f8d2aadc268b40f7e55af",
"pr_number": 34742,
"author": "jeongin601",
"merged_by": "zucchini-nlp"
},
{
"test": "tests/models/nemotron/test_modeling_nemotron.py::NemotronModelTest::test_torchscript_simple",
"commit": "318fe25f22a99ce1226f8d2aadc268b40f7e55af",
"pr_number": 34742,
"author": "jeongin601",
"merged_by": "zucchini-nlp"
}
]
}
}, |
|
@ArthurZucker @zucchini-nlp A kind remind: don't hesitate to ask for slow CI 🙂 - let's use the tools we have to make our life easier🙏 |
|
Ah it makes sense, torchscript does not support |
|
(AFAIR) |
|
ah ok. I will check than. But if we eventually move forward to |
|
Yeah 👀 unless used with optimum! |
…ls (huggingface#34742) * modeling nemotron kv caching bugfix Signed-off-by: jeongin601 <0200angela@gmail.com> * test file deleted Signed-off-by: jeongin601 <0200angela@gmail.com> * code refinement Signed-off-by: jeongin601 <0200angela@gmail.com> * remove unused variables Signed-off-by: jeongin601 <0200angela@gmail.com> * import block sorted * removed deprecation warning Signed-off-by: jeongin601 <0200angela@gmail.com> * removed support for tuple shape past_key_values Signed-off-by: jeongin601 <0200angela@gmail.com> * Update conditional statement for cache initialization Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Signed-off-by: jeongin601 <0200angela@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
What does this PR do?
Fixes #34739
Problem
Current implementation does not enable key value caching of nemotron and minitron models.
This problem can be checked by a quick example code that generates key and value caches.
Modification
I modified the code to enabled key value caching while prefill phase, in reference to
'modeling_llama.py'file.Key value caching example code
As-Is Result
Key value caching is not done.
To-be Result
Key value caching is enabled
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ArthurZucker
Can you please check my modification? :)