Skip to content

fix(musicgen_melody): use DynamicCache instead of EncoderDecoderCache#45738

Open
adityachoksi2512 wants to merge 1 commit intohuggingface:mainfrom
adityachoksi2512:fix/musicgen-melody-cache
Open

fix(musicgen_melody): use DynamicCache instead of EncoderDecoderCache#45738
adityachoksi2512 wants to merge 1 commit intohuggingface:mainfrom
adityachoksi2512:fix/musicgen-melody-cache

Conversation

@adityachoksi2512
Copy link
Copy Markdown

What does this PR do?

Fixes #45647

MusicgenMelody fuses encoder_hidden_states directly into inputs_embeds
and uses pure self-attention — it does not use cross-attention. Using
EncoderDecoderCache caused audio conditioning to be silently ignored
during generation, producing byte-identical output regardless of the audio input.

Root cause

Identified via git bisect — the regression was introduced in #38635, which
refactored the cache system. The decoder was incorrectly initialized with
EncoderDecoderCache when it only needs a plain DynamicCache.

Fix

One line change in MusicgenMelodyDecoder.forward():

# Before
past_key_values = EncoderDecoderCache(DynamicCache(config=self.config), DynamicCache(config=self.config))

# After  
past_key_values = DynamicCache(config=self.config)

Testing

Existing test suite passes (133 passed, 62 skipped).
Regression test for this specific bug is in #45737.

@adityachoksi2512 adityachoksi2512 force-pushed the fix/musicgen-melody-cache branch from cc5566d to efa9db1 Compare May 1, 2026 16:12
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: musicgen_melody

@adityachoksi2512
Copy link
Copy Markdown
Author

I noticed @voodoovampire removed the regression test with a note about deeper investigation. Happy to revisit the fix if the root cause turns out to be more complex - let me know if additional changes are needed. cc @ebezzam @eustlb

@voodoovampire
Copy link
Copy Markdown

Hey @adityachoksi2512 - just to clarify the situation:

I independently created PR #45737 which includes:

1.Your cache fix (cherry-picked with full co-author credit to you)

2.A regression test that proves the audio conditioning bug still exists even with the cache fix

I temporarily removed the test but have now restored it as @pytest.mark.xfail to document the expected behavior for future work.

Your cache fix prevents crashes, which is valuable. But the regression test shows audio conditioning is still broken - two different audio inputs produce identical outputs. The root cause needs deeper investigation beyond just the cache type change.

Both PRs address the same issue - maintainers will decide which to merge. Just wanted to make sure the full context is clear.

@adityachoksi2512
Copy link
Copy Markdown
Author

Thanks for the clarification @voodoovampire. Good point on the deeper investigation - I'll defer to the maintainers on next steps. Happy to contribute further in whatever direction they suggest. cc @ebezzam @eustlb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MusicgenMelody ignores audio conditioning (regression between 4.48 and 4.57)

2 participants