Skip to content

Cache: revert DynamicCache init for BC#33861

Merged
ArthurZucker merged 13 commits intohuggingface:mainfrom
gante:dynamic_init
Oct 4, 2024
Merged

Cache: revert DynamicCache init for BC#33861
ArthurZucker merged 13 commits intohuggingface:mainfrom
gante:dynamic_init

Conversation

@gante
Copy link
Copy Markdown
Contributor

@gante gante commented Oct 1, 2024

What does this PR do?

Reverts the optional argument in DynamicCache.__init__, which broke BC in a few edge cases.

The optional argument was needed in mllama because the model skips layers. The original solution was to initialize empty lists on all layers, with the number of layers being an input argument.

This PR removes the optional argument, but adds the ability to handle skipped layers on DynamicCache.

EDIT: fixes #33794


✅ same tests are passing as in main (there are a couple of slow tests failing!)

Comment thread src/transformers/cache_utils.py Outdated
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Classes that expand dynamic cache don't support skipping layers, at least for now.

This PR did not add this limitation, but adds the exception :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on main, test_new_cache_format_2 isn't failing all times, but rather is flaky.

In fact, all variants of test_new_cache_format are flaky. I suspect that it is because the random model may be generating image tokens (we had a similar issue in other models)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i didn't know almost all mllama generation tests are being skipped currently, I'll come back to mllama soon and will try to find why they are failing. No good to have them skipped

Copy link
Copy Markdown
Contributor Author

@gante gante Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new quick test to ensure we don't regress :)

(there is a slow test, but we don't check the slow tests often enough)

@gante gante changed the title Cache: revert DynamicCache init Cache: revert DynamicCache init for BC Oct 1, 2024
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

Hey! 🤗 Thanks for your contribution to the transformers library!

Before merging this pull request, slow tests CI should be triggered. To enable this:

  • Add the run-slow label to the PR
  • When your PR is ready for merge and all reviewers' comments have been addressed, push an empty commit with the command [run-slow] followed by a comma separated list of all the models to be tested, i.e. [run_slow] model_to_test_1, model_to_test_2
    • If the pull request affects a lot of models, put at most 10 models in the commit message
  • A transformers maintainer will then approve the workflow to start the tests

(For maintainers) The documentation for slow tests CI on PRs is here.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for finding a less breaking way to support mllama cache :)

Left one comment in quantized cache, I guess the tests are not catching it because it requires quanto or hqq to be installed in the env

Comment thread src/transformers/cache_utils.py Outdated
Comment on lines 678 to 679
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quantized cache can support anything that dynamic cache supports, and this check should break any generation with quantized cache because we are no longer able to fill in the cache from zero

If mllama fails with quantized cache, maybe we can do _supports_quantized_cache=False and add a TODO for us? Skipping layers here should be also straightforward I hope

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

  • in the commit you reviewed this line, tests should be failing
  • pushed a commit that fixed it (py.test tests/models/llama/ -k test_generate_with_quant_cache -vv now passes)
  • this probably means, as you wrote, that quanto tests are not being run. will investigate

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i didn't know almost all mllama generation tests are being skipped currently, I'll come back to mllama soon and will try to find why they are failing. No good to have them skipped

@gante
Copy link
Copy Markdown
Contributor Author

gante commented Oct 3, 2024

(rebased to include a flaky test fix)

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be extra extra careful here as people changed their code (cf TGI) for us, we can't break again

Comment thread src/transformers/cache_utils.py Outdated
"""

def __init__(self, num_hidden_layers: Optional[int] = None) -> None:
def __init__(self) -> None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we ought to keep this for BC

Comment on lines -366 to -371
if num_hidden_layers is None:
self.key_cache: List[torch.Tensor] = []
self.value_cache: List[torch.Tensor] = []
else:
self.key_cache: List[torch.Tensor] = [[] for _ in range(num_hidden_layers)]
self.value_cache: List[torch.Tensor] = [[] for _ in range(num_hidden_layers)]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we are not gonna break other stuff, do you have links to issues were we broke something ?!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

broken things as a result of these changes:

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's run slow tests!

Comment thread src/transformers/cache_utils.py
Comment thread src/transformers/cache_utils.py Outdated
Comment thread src/transformers/cache_utils.py
@ArthurZucker ArthurZucker merged commit 38f9f10 into huggingface:main Oct 4, 2024
@gante gante deleted the dynamic_init branch October 7, 2024 09:21
ArthurZucker added a commit that referenced this pull request Oct 7, 2024
* tmp commit

* tmp commit

* make fixup

* missing removal

* fix condition

* fix end-to-end compilation

* if -> elif

* BC

* BC

* use @deprecate_kwarg("num_hidden_layers", version="4.47.0")

* wups the import

* 🥴

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
ArthurZucker added a commit that referenced this pull request Oct 7, 2024
* tmp commit

* tmp commit

* make fixup

* missing removal

* fix condition

* fix end-to-end compilation

* if -> elif

* BC

* BC

* use @deprecate_kwarg("num_hidden_layers", version="4.47.0")

* wups the import

* 🥴

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
* tmp commit

* tmp commit

* make fixup

* missing removal

* fix condition

* fix end-to-end compilation

* if -> elif

* BC

* BC

* use @deprecate_kwarg("num_hidden_layers", version="4.47.0")

* wups the import

* 🥴

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

End-to-end generation compile stopped working

4 participants