[generate] shape checks in tests compatible with fixed-length caches (+ some minor fixes) by gante · Pull Request #35993 · huggingface/transformers

gante · 2025-01-31T15:48:00Z

What does this PR do?

Fixes a few test cases exposed by #33212 [e.g. some models are failing cache shape checks, if their default cache is not a dynamic cache]. Fix added in a separate PR to avoid making #33212 a huge PR :)

This PR:

Updates generate output checks to also work with fixed-length caches;
Tests with output_attentions=True now explicitly use eager attention (sdpa doesn't return the attentions, eager was being used implicitly with warnings);
(enabled by 1. and 2.) Adds a test regarding generate + extra outputs + compile, which was not being tested;
Standardizes the inconsistent method to pull the cache from the model outputs in the generation loop;
(enabled by 1.) Deletes unnecessary overwrites.
[EDIT, added after PR reviews] updates variable names used in common tests, so we can easily understand what's going on.

HuggingFaceDocBuilderDev · 2025-01-31T16:19:46Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

gante · 2025-02-05T19:03:01Z

        # - different models have a different cache name expected by the model (default = "past_key_values")
        # - `max_length`, prepared above, is used to determine the maximum cache length
-        max_cache_length = generation_config.max_length
+        max_cache_length = generation_config.max_length - 1


We were creating caches that were larger than what was set by our max length flags: the last token is not present in the cache.

ydshieh · 2025-02-06T09:19:32Z

Before diving into more details, I think it's best to separate the fix of copies in a separate PR.

#36063

we can revert the changes (in this PR) to those files once the above PR is merged.

gante · 2025-02-06T15:49:00Z

@ydshieh should be ready for a re-review.

Two notes:

_check_outputs and especially its inner functions should be much easier to follow. LMK if any part is unclear—documenting these general functions is important to future-proof our test base!
As I rewrote the test functions, I noticed some overwrites were not necessary (or at least could be simplified) ✂️

ydshieh

left 2 nit comments + I don't know well the removal of _update_model_kwargs_for_generation in the 2 modeling files (for which I will leave to another reviewer 🙏 )

Otherwise all good to me 💯 thank you

zucchini-nlp

Thanks for cleaning up! Left one suggestion for static cache tests :)

zucchini-nlp · 2025-02-06T09:30:49Z


            inputs_embeds = model.get_input_embeddings()(input_ids)
-            max_cache_len += inputs_embeds.shape[1]
+            max_cache_len += inputs_embeds.shape[1] - 1  # the last generated token has no cache


maybe not super related, imo using max_new_tokens instead of max_cache_len is better. We cant know if the inputs length is already longer than preset max_cache_len

Already had it fixed somewhere, but it was a huge PR so it's lost hehe

(leaving this one for a separate PR, moving conversation to slack)

zucchini-nlp · 2025-02-07T08:32:44Z

+                    batch_size=internal_batch_size,
+                    attentions=output.decoder_attentions,
+                    prompt_length=1,  # the BOS token
+                    output_length=output.sequences.shape[-1],


love the naming!

zucchini-nlp · 2025-02-07T08:34:51Z

+        generated_length = (
+            output.sequences.shape[-1] - 1 if config.is_encoder_decoder else output.sequences.shape[-1] - prompt_length
        )
+        decoder_past_key_values = getattr(output, "past_key_values", None)


Is it possible that we have no attr past_key_values?

When use_cache=False :P (or models with different cache names, like RWKV)

zucchini-nlp · 2025-02-07T08:37:35Z

+        self.assertEqual(len(attentions), (output_length - prompt_length))
+
+        use_cache = decoder_past_key_values is not None
+        has_static_cache = isinstance(decoder_past_key_values, (StaticCache, HybridCache))


possible edge case, HybridCache has non-uniform max length for each layer and we might have sliding layer lengths which are different from non-sliding layers. Do you think we need to account for that?

I'll add a note here, so we can easily know what to do when the test breaks.

(but for now will leave as is, to contain test complexity)

…(+ some minor fixes) (huggingface#35993) * shape checks compatible with static cache * add test * tmp * manually turn on eager attn when we want to output attn * typo * generalize to encoder-decoder models * force compilation on cpu * tmp commit * fix static cache shape checks * models with odd caches * fix copies * shorter cache search loop * use decoder_past_key_values everywhere * better test variable names and comments * signature * rename _check_outputs into _check_generate_outputs * add comments * HybridCache future test note

gante and others added 2 commits January 31, 2025 15:46

shape checks compatible with static cache

d3fce70

Merge branch 'main' into static_cache_checks

54f6885

gante and others added 5 commits February 3, 2025 16:40

add test

00fe64d

Merge branch 'main' into static_cache_checks

d3928aa

tmp

4d6560f

manually turn on eager attn when we want to output attn

74d4548

typo

fe7b7e6

gante changed the title ~~[generate] shape checks in tests compatible with static cache~~ [generate] shape checks in tests compatible with fixed-length caches Feb 3, 2025

gante changed the title ~~[generate] shape checks in tests compatible with fixed-length caches~~ [generate] shape checks in tests compatible with fixed-length caches (+ some minor fixes) Feb 3, 2025

gante and others added 7 commits February 5, 2025 12:14

Merge branch 'main' into static_cache_checks

1891c79

generalize to encoder-decoder models

7f62a11

force compilation on cpu

7771426

tmp commit

4ae8c10

fix static cache shape checks

ff878b8

models with odd caches

f770fd7

Merge branch 'main' into static_cache_checks

a812770

gante commented Feb 5, 2025

View reviewed changes

gante requested review from ydshieh and zucchini-nlp February 5, 2025 19:08

fix copies

53051f9

gante marked this pull request as ready for review February 5, 2025 19:20

gante commented Feb 5, 2025

View reviewed changes

Comment thread src/transformers/models/gpt_neox/modeling_gpt_neox.py Outdated