fix(musicgen_melody): use DynamicCache instead of EncoderDecoderCache by adityachoksi2512 · Pull Request #45738 · huggingface/transformers

adityachoksi2512 · 2026-05-01T14:43:29Z

What does this PR do?

MusicgenMelody fuses encoder_hidden_states directly into inputs_embeds
and uses pure self-attention — it does not use cross-attention. Using
EncoderDecoderCache caused audio conditioning to be silently ignored
during generation, producing byte-identical output regardless of the audio input.

Root cause

Identified via git bisect — the regression was introduced in #38635, which
refactored the cache system. The decoder was incorrectly initialized with
EncoderDecoderCache when it only needs a plain DynamicCache.

Fix

One line change in MusicgenMelodyDecoder.forward():

# Before
past_key_values = EncoderDecoderCache(DynamicCache(config=self.config), DynamicCache(config=self.config))

# After  
past_key_values = DynamicCache(config=self.config)

Testing

Existing test suite passes (133 passed, 62 skipped).
Regression test for this specific bug is in #45737.

…Fixes huggingface#45647

github-actions · 2026-05-01T16:14:15Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: musicgen_melody

adityachoksi2512 · 2026-05-01T16:25:33Z

I noticed @voodoovampire removed the regression test with a note about deeper investigation. Happy to revisit the fix if the root cause turns out to be more complex - let me know if additional changes are needed. cc @ebezzam @eustlb

voodoovampire · 2026-05-01T16:40:08Z

Hey @adityachoksi2512 - just to clarify the situation:

I independently created PR #45737 which includes:

1.Your cache fix (cherry-picked with full co-author credit to you)

2.A regression test that proves the audio conditioning bug still exists even with the cache fix

I temporarily removed the test but have now restored it as @pytest.mark.xfail to document the expected behavior for future work.

Your cache fix prevents crashes, which is valuable. But the regression test shows audio conditioning is still broken - two different audio inputs produce identical outputs. The root cause needs deeper investigation beyond just the cache type change.

Both PRs address the same issue - maintainers will decide which to merge. Just wanted to make sure the full context is clear.

adityachoksi2512 · 2026-05-01T16:50:07Z

Thanks for the clarification @voodoovampire. Good point on the deeper investigation - I'll defer to the maintainers on next steps. Happy to contribute further in whatever direction they suggest. cc @ebezzam @eustlb

adityachoksi2512 mentioned this pull request May 1, 2026

MusicgenMelody ignores audio conditioning (regression between 4.48 and 4.57) #45647

Open

adityachoksi2512 force-pushed the fix/musicgen-melody-cache branch from 9b910bc to cc5566d Compare May 1, 2026 15:07

fix(musicgen_melody): use DynamicCache instead of EncoderDecoderCache. …

efa9db1

…Fixes huggingface#45647

adityachoksi2512 force-pushed the fix/musicgen-melody-cache branch from cc5566d to efa9db1 Compare May 1, 2026 16:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(musicgen_melody): use DynamicCache instead of EncoderDecoderCache#45738

fix(musicgen_melody): use DynamicCache instead of EncoderDecoderCache#45738
adityachoksi2512 wants to merge 1 commit intohuggingface:mainfrom
adityachoksi2512:fix/musicgen-melody-cache

adityachoksi2512 commented May 1, 2026

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

adityachoksi2512 commented May 1, 2026

Uh oh!

voodoovampire commented May 1, 2026

Uh oh!

adityachoksi2512 commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

adityachoksi2512 commented May 1, 2026

What does this PR do?

Root cause

Fix

Testing

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

adityachoksi2512 commented May 1, 2026

Uh oh!

voodoovampire commented May 1, 2026

Uh oh!

adityachoksi2512 commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants