Cache: revert DynamicCache init for BC#33861
Conversation
There was a problem hiding this comment.
Classes that expand dynamic cache don't support skipping layers, at least for now.
This PR did not add this limitation, but adds the exception :)
There was a problem hiding this comment.
on main, test_new_cache_format_2 isn't failing all times, but rather is flaky.
In fact, all variants of test_new_cache_format are flaky. I suspect that it is because the random model may be generating image tokens (we had a similar issue in other models)
There was a problem hiding this comment.
oh i didn't know almost all mllama generation tests are being skipped currently, I'll come back to mllama soon and will try to find why they are failing. No good to have them skipped
There was a problem hiding this comment.
new quick test to ensure we don't regress :)
(there is a slow test, but we don't check the slow tests often enough)
|
Hey! 🤗 Thanks for your contribution to the Before merging this pull request, slow tests CI should be triggered. To enable this:
(For maintainers) The documentation for slow tests CI on PRs is here. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
zucchini-nlp
left a comment
There was a problem hiding this comment.
Great, thanks for finding a less breaking way to support mllama cache :)
Left one comment in quantized cache, I guess the tests are not catching it because it requires quanto or hqq to be installed in the env
There was a problem hiding this comment.
Quantized cache can support anything that dynamic cache supports, and this check should break any generation with quantized cache because we are no longer able to fill in the cache from zero
If mllama fails with quantized cache, maybe we can do _supports_quantized_cache=False and add a TODO for us? Skipping layers here should be also straightforward I hope
There was a problem hiding this comment.
Good catch!
- in the commit you reviewed this line, tests should be failing
- pushed a commit that fixed it (
py.test tests/models/llama/ -k test_generate_with_quant_cache -vvnow passes) - this probably means, as you wrote, that quanto tests are not being run. will investigate
There was a problem hiding this comment.
oh i didn't know almost all mllama generation tests are being skipped currently, I'll come back to mllama soon and will try to find why they are failing. No good to have them skipped
|
(rebased to include a flaky test fix) |
ArthurZucker
left a comment
There was a problem hiding this comment.
Let's be extra extra careful here as people changed their code (cf TGI) for us, we can't break again
| """ | ||
|
|
||
| def __init__(self, num_hidden_layers: Optional[int] = None) -> None: | ||
| def __init__(self) -> None: |
There was a problem hiding this comment.
we ought to keep this for BC
| if num_hidden_layers is None: | ||
| self.key_cache: List[torch.Tensor] = [] | ||
| self.value_cache: List[torch.Tensor] = [] | ||
| else: | ||
| self.key_cache: List[torch.Tensor] = [[] for _ in range(num_hidden_layers)] | ||
| self.value_cache: List[torch.Tensor] = [[] for _ in range(num_hidden_layers)] |
There was a problem hiding this comment.
I am wondering if we are not gonna break other stuff, do you have links to issues were we broke something ?!
There was a problem hiding this comment.
broken things as a result of these changes:
- FIX: Change check if past_key_values is empty peft#2106 -- PEFT was relying on
__bool__(which maps to__iter__, which in turn maps to__len__, i.e.len(self.key_cache)), which got changed whennum_hidden_layerswas being passed - End-to-end generation compile stopped working #33794 -- retrieving
num_hidden_layersto initialize the cache needed to callconfig.get_text_config(), which usesgetattrunder the hood.getattris not compileable by torch.compile
ArthurZucker
left a comment
There was a problem hiding this comment.
Let's run slow tests!
* tmp commit
* tmp commit
* make fixup
* missing removal
* fix condition
* fix end-to-end compilation
* if -> elif
* BC
* BC
* use @deprecate_kwarg("num_hidden_layers", version="4.47.0")
* wups the import
* 🥴
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
* tmp commit
* tmp commit
* make fixup
* missing removal
* fix condition
* fix end-to-end compilation
* if -> elif
* BC
* BC
* use @deprecate_kwarg("num_hidden_layers", version="4.47.0")
* wups the import
* 🥴
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
* tmp commit
* tmp commit
* make fixup
* missing removal
* fix condition
* fix end-to-end compilation
* if -> elif
* BC
* BC
* use @deprecate_kwarg("num_hidden_layers", version="4.47.0")
* wups the import
* 🥴
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
What does this PR do?
Reverts the optional argument in
DynamicCache.__init__, which broke BC in a few edge cases.The optional argument was needed in
mllamabecause the model skips layers. The original solution was to initialize empty lists on all layers, with the number of layers being an input argument.This PR removes the optional argument, but adds the ability to handle skipped layers on
DynamicCache.EDIT: fixes #33794
✅ same tests are passing as in
main(there are a couple of slow tests failing!)