Cache: revert DynamicCache init for BC by gante · Pull Request #33861 · huggingface/transformers

gante · 2024-10-01T10:24:03Z

What does this PR do?

Reverts the optional argument in DynamicCache.__init__, which broke BC in a few edge cases.

The optional argument was needed in mllama because the model skips layers. The original solution was to initialize empty lists on all layers, with the number of layers being an input argument.

This PR removes the optional argument, but adds the ability to handle skipped layers on DynamicCache.

EDIT: fixes #33794

✅ same tests are passing as in main (there are a couple of slow tests failing!)

gante · 2024-10-01T10:27:57Z

Classes that expand dynamic cache don't support skipping layers, at least for now.

This PR did not add this limitation, but adds the exception :)

gante · 2024-10-01T10:29:41Z

on main, test_new_cache_format_2 isn't failing all times, but rather is flaky.

In fact, all variants of test_new_cache_format are flaky. I suspect that it is because the random model may be generating image tokens (we had a similar issue in other models)

oh i didn't know almost all mllama generation tests are being skipped currently, I'll come back to mllama soon and will try to find why they are failing. No good to have them skipped

gante · 2024-10-01T10:30:01Z

new quick test to ensure we don't regress :)

(there is a slow test, but we don't check the slow tests often enough)

HuggingFaceDocBuilderDev · 2024-10-01T10:34:59Z

Hey! 🤗 Thanks for your contribution to the transformers library!

Before merging this pull request, slow tests CI should be triggered. To enable this:

Add the run-slow label to the PR
When your PR is ready for merge and all reviewers' comments have been addressed, push an empty commit with the command [run-slow] followed by a comma separated list of all the models to be tested, i.e. [run_slow] model_to_test_1, model_to_test_2
- If the pull request affects a lot of models, put at most 10 models in the commit message
A transformers maintainer will then approve the workflow to start the tests

(For maintainers) The documentation for slow tests CI on PRs is here.

HuggingFaceDocBuilderDev · 2024-10-01T10:58:01Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp

Great, thanks for finding a less breaking way to support mllama cache :)

Left one comment in quantized cache, I guess the tests are not catching it because it requires quanto or hqq to be installed in the env

zucchini-nlp · 2024-10-03T08:48:34Z

Quantized cache can support anything that dynamic cache supports, and this check should break any generation with quantized cache because we are no longer able to fill in the cache from zero

If mllama fails with quantized cache, maybe we can do _supports_quantized_cache=False and add a TODO for us? Skipping layers here should be also straightforward I hope

Good catch!

in the commit you reviewed this line, tests should be failing

pushed a commit that fixed it (py.test tests/models/llama/ -k test_generate_with_quant_cache -vv now passes)

this probably means, as you wrote, that quanto tests are not being run. will investigate

zucchini-nlp · 2024-10-03T08:50:30Z

oh i didn't know almost all mllama generation tests are being skipped currently, I'll come back to mllama soon and will try to find why they are failing. No good to have them skipped

gante · 2024-10-03T11:28:33Z

(rebased to include a flaky test fix)

ArthurZucker

Let's be extra extra careful here as people changed their code (cf TGI) for us, we can't break again

ArthurZucker · 2024-10-03T14:53:01Z

    """

-    def __init__(self, num_hidden_layers: Optional[int] = None) -> None:
+    def __init__(self) -> None:


we ought to keep this for BC

ArthurZucker · 2024-10-03T14:53:59Z

-        if num_hidden_layers is None:
-            self.key_cache: List[torch.Tensor] = []
-            self.value_cache: List[torch.Tensor] = []
-        else:
-            self.key_cache: List[torch.Tensor] = [[] for _ in range(num_hidden_layers)]
-            self.value_cache: List[torch.Tensor] = [[] for _ in range(num_hidden_layers)]


I am wondering if we are not gonna break other stuff, do you have links to issues were we broke something ?!

broken things as a result of these changes:

FIX: Change check if past_key_values is empty peft#2106 -- PEFT was relying on __bool__ (which maps to __iter__, which in turn maps to __len__, i.e. len(self.key_cache)), which got changed when num_hidden_layers was being passed

End-to-end generation compile stopped working #33794 -- retrieving num_hidden_layers to initialize the cache needed to call config.get_text_config(), which uses getattr under the hood. getattr is not compileable by torch.compile

ArthurZucker

Let's run slow tests!

* tmp commit * tmp commit * make fixup * missing removal * fix condition * fix end-to-end compilation * if -> elif * BC * BC * use @deprecate_kwarg("num_hidden_layers", version="4.47.0") * wups the import * 🥴 --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>

gante commented Oct 1, 2024

View reviewed changes

gante requested review from ArthurZucker and zucchini-nlp October 1, 2024 10:31

gante changed the title ~~Cache: revert DynamicCache init~~ Cache: revert DynamicCache init for BC Oct 1, 2024

zucchini-nlp approved these changes Oct 3, 2024

View reviewed changes

gante added 5 commits October 3, 2024 11:27

tmp commit

52746cd

tmp commit

b45ec1e

make fixup

1c7d484

missing removal

950f074

fix condition

cef3376

gante force-pushed the dynamic_init branch from f14c5ee to cef3376 Compare October 3, 2024 11:28

gante mentioned this pull request Oct 3, 2024

End-to-end generation compile stopped working #33794

Closed

4 tasks

gante and others added 3 commits October 3, 2024 14:46

fix end-to-end compilation

436dcd6

Merge branch 'main' into dynamic_init

5e6cf33

if -> elif

1471d82

ArthurZucker reviewed Oct 3, 2024

View reviewed changes

gante added 2 commits October 3, 2024 15:42

BC

7bbcc32

BC

a3251b7

ArthurZucker approved these changes Oct 4, 2024

View reviewed changes

Comment thread src/transformers/cache_utils.py

Comment thread src/transformers/cache_utils.py Outdated

Comment thread src/transformers/cache_utils.py

ArthurZucker added 3 commits October 4, 2024 22:10

use @deprecate_kwarg("num_hidden_layers", version="4.47.0")

d647697

wups the import

59d7bf1

🥴

3956549

ArthurZucker merged commit 38f9f10 into huggingface:main Oct 4, 2024

gante deleted the dynamic_init branch October 7, 2024 09:21

Conversation

gante commented Oct 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gante Oct 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Oct 1, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Oct 1, 2024

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gante commented Oct 3, 2024

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gante commented Oct 1, 2024 •

edited

Loading

gante Oct 1, 2024 •

edited

Loading