Cache: init empty cache when `use_cache` by zucchini-nlp · Pull Request #34274 · huggingface/transformers

zucchini-nlp · 2024-10-21T08:15:26Z

What does this PR do?

Fixes #34206. As per title we would have to initialize empty cache whenever use_cache=True. Additionally MllamaForCausalLM was not loading correctly for me, so I modified the base model prefix

HuggingFaceDocBuilderDev · 2024-10-25T08:59:50Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

gante

Two items to check :D

gante · 2024-10-29T14:49:36Z

 class MllamaForCausalLM(MllamaPreTrainedModel, GenerationMixin):
    config_class = MllamaTextConfig
-    base_model_prefix = "model"
+    base_model_prefix = "language_model"


From the docstring of PreTrainedModel, regarding base_model_prefix:

A string indicating the attribute associated to the base model in derived classes of the same architecture adding modules on top of the base model.

How is the hierarchy in the model weights? I'm assuming it's

model -|- language_model --------------|- model - (...) | |- lm_head |- vision_model-(...) |- multi_modal_projector-(...)

If that's the case, then I agree with the change, assuming we also change self.model to self.language_model in this class

(make sure all slow tests pass!)

Yes, that is how the checkpoints looks like and when I loaded the model it didn't load correctly if the base prefix is not fixed. The slow tests unfortunately can't be run because the model is read-token protected and EU has no access to them 🥲 But I tested with open mirrored weights

Fair :)

Can we change self.model to self.language_model? Some parts of the codebase call variants of inner_model = getattr(model, model.base_model_prefix)

that would mean we change the checkpoint state dict keys right? 🤔 anyway, lemme verify this and tell if it is possible without touching checkpoint

that would mean we change the checkpoint state dict keys right?

Uhmmm possibly? Not sure 😅 The tests that save and reload would break if it would not be BC, I think

Indeed, I found why it was changed, it was to get the self.base_model method for preparing causal mask in #33677 hehe

and yes, w/o changing state dict keys we cannot call it "model". Imo even if we change the official state dict, there are many mirrors/finetunes which will be BC breaking compared to the first model release. So the better way i think is to bring back the base-model-prefix as it was.

I'm thinking maybe we can have a default method for update_causal_mask and prepare_attention_mask which will be the fallback if the base-model has no such method defined? 🤔

EDIT: but wait, Arthur might disagree as he wanted to have attn preparation is all model files instead of having one copy in general modeling file. In that case, we might need to get smth better than getatte(self, base_model_prefix) as it doesn't work when same checkpoint is loaded as CausalLM and as ConditionalLM :(

🤔

Regarding the generate-specific problem: if base_model = getattr(self, self.base_model_prefix, None) in the generalist prepare_inputs_for_generation (or its downstream usage) is the issue, then my recommendation would be to overwrite prepare_inputs_for_generation in mllama -- more specifically, in the classes where it doesn't work

Alternatively, we could define _prepare_4d_causal_attention_mask_with_cache_position in all model classes -- write it once in the innermost class, then the child classes would define this function as a parent call.

Any of these solutions would work well for me :) (with a preference for the second: when we rewrite prepare_inputs_for_generation, we know for sure we'll have extra maintenance in the future)

done, and also enabled compile tests for the CausalLM class to test that it works

It should indeed be langauge_model but probably in a different PR as it is unrelated!

oke, will make a new PR

This reverts commit 32d3563.

ArthurZucker

Thanks LGTM but let's separate unrelated changes!

ArthurZucker · 2024-11-21T13:56:54Z

 class MllamaForCausalLM(MllamaPreTrainedModel, GenerationMixin):
    config_class = MllamaTextConfig
-    base_model_prefix = "model"
+    base_model_prefix = "language_model"


It should indeed be langauge_model but probably in a different PR as it is unrelated!

ArthurZucker · 2024-11-21T13:57:52Z

+        if use_cache and past_key_values is None:
+            past_key_values = DynamicCache()
+


this does make sense as it's helping users, and an old APi, but let's promote init of a cache and passing it! 🤗

We could but still I think it is a lot easier if users want a forward pass with cache, and do not want extra lines of code for importing and passing the cache object. So i think we'd better keep the default cache for now

ArthurZucker

Thanks 🤗

ArthurZucker · 2024-11-22T14:36:57Z

            )
            use_cache = False

+        if use_cache and past_key_values is None:


missing torch jit tracing escape here no?

* fix * fix tests * fix copies * add docs * Revert "add docs" This reverts commit 32d3563. * qwen move deltas * mllama can potentiall fullgraph compile * enable mllama compile and fix tests * remove mllama fixes

fix

1c319b7

zucchini-nlp requested a review from gante October 21, 2024 08:15

zucchini-nlp added 4 commits October 21, 2024 11:24

merge main

4a0c982

fix tests

b1dbbf7

fix copies

c637d4f

Merge branch 'main' into cache-empty-init

4ffe9c5

gante reviewed Oct 29, 2024

View reviewed changes

zucchini-nlp added 4 commits October 30, 2024 07:44

add docs

32d3563

Merge remote-tracking branch 'upstream/main' into cache-empty-init

7764aa2

Revert "add docs"

79c5e41

This reverts commit 32d3563.

qwen move deltas

f91f59d

zucchini-nlp mentioned this pull request Nov 15, 2024

Transformers 4.46.2 breaks model loading for Llama 3.2 90B Vision Instruct #34689

Closed

4 tasks

zucchini-nlp added 3 commits November 15, 2024 14:14

Merge branch 'main' into cache-empty-init

b462c39

mllama can potentiall fullgraph compile

a7742c8

enable mllama compile and fix tests

58d6d7b

zucchini-nlp requested review from ArthurZucker and gante November 15, 2024 15:36

ArthurZucker mentioned this pull request Nov 21, 2024

Fix: Enable prefill phase key value caching of nemotron/minitron models #34742

Merged

5 tasks

ArthurZucker reviewed Nov 21, 2024

View reviewed changes

zucchini-nlp added 2 commits November 22, 2024 08:30

remove mllama fixes

b9dc534

Merge remote-tracking branch 'upstream/main' into cache-empty-init

0d77673

zucchini-nlp mentioned this pull request Nov 22, 2024

🔴 Mllama: fix base prefix #34874

Merged

ArthurZucker approved these changes Nov 22, 2024

View reviewed changes

Merge branch 'main' into cache-empty-init

dc02c11

zucchini-nlp merged commit c1a8520 into huggingface:main Nov 25, 2024

Tcc0403 mentioned this pull request Jan 11, 2025

IndexError: The shape of the mask [7387] at index 0 does not match the shape of the indexed tensor [1] at index 0 linkedin/Liger-Kernel#515

Closed

BenasdTW mentioned this pull request Jan 18, 2025

Qwen2-VL breaks with transformers version 4.47.0+: TypeError: lce_forward() got an unexpected keyword argument 'cache_position' linkedin/Liger-Kernel#528

Closed

		if use_cache and past_key_values is None:
		past_key_values = DynamicCache()

Conversation

zucchini-nlp commented Oct 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Oct 25, 2024

Uh oh!

gante left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gante Nov 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Nov 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zucchini-nlp commented Oct 21, 2024 •

edited

Loading

gante Nov 4, 2024 •

edited

Loading

zucchini-nlp Nov 15, 2024 •

edited

Loading