Add Music Flamingo#43538
Conversation
|
Note: "duplicate" of #43458 |
|
|
||
|
|
||
| # classes | ||
| class MusicFlamingoRotaryEmbedding(Module): |
There was a problem hiding this comment.
can you double check that LlamaRotaryEmbedding can't be used here?
If not, if you can try to imitate its style/structure, namely taking as input the config and (optionally) the device
There was a problem hiding this comment.
LlamaRotaryEmbedding isn't a drop-in here (MusicFlamingo uses axial/time RoTE + extra time modulation). I still refactored our rotary class to Llama-like structure (config + init helper), while preserving the exact original math so outputs stay unchanged.
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
|
run-slow: musicflamingo |
|
This comment contains models: ["models/musicflamingo"] |
|
@ArthurZucker thanks for the review! @lashahub fyi I've recomputed the fixtures, as we discussed with @Sreyan88 to use the expected outputs with the checkpoint+code at merge (since there isn't a public reference checkpoint) |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43538&sha=cd693f |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: audioflamingo3, auto, glmasr, musicflamingo |
1 similar comment
|
[For maintainers] Suggested jobs to run (before merge) run-slow: audioflamingo3, auto, glmasr, musicflamingo |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43538&sha=d0d18f |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: audioflamingo3, auto, glmasr, musicflamingo |
* Music flamingo * Fix pos embeddings * Method arg docstrings * Add tests & docs * Fix AF3 dtype bug * Fix the MF performance issue * Fix pos embeddings * Fix embeddings & format * Remove external deps * Update processor token names * Cleanup * Simplify RotaryEmbedding to lang-only * Reuse AF3 config classes * Trim+rename rotary embedding * Call parent _init_weights first and drop rotary einsum * Precompute rotary cache at init * Use modular processor pattern for MusicFlamingo * Remove audio-only inference example * Refactor Audio Feature Casting Path * Clarify private source repo * Clean up modular * Move config to modular * Formatting * Remove dummy * Derive musicflamingo timing and rotary config * Llama style rotary embeddings * Added reproducer comments * Expose _init_weights for modular. * Satisfy repo checks * Align MusicFlamingo rotary with Llama style * Move MusicFlamingo _init_weights to encoder * Keep old behavior * Move MusicFlamingo rotary settings into encoder rope_parameters * Use AutoConfig in AF3/MF * Align MusicFlamingo RoTE with Llama RoPE conventions * Update outdated fixtures * init_weights without changing others * FIx import * Remove backward compat * Regenerate modeling for MF * Fix AF3 batch inference bug * Simplify config and nit. * Conform more to transformers convention, e.g. removing unused code paths. * Add another possible AF3 prefix. * Use auto_docstring and update docstrings. * Nits * Nit for review * Shift RoTE to main model so that encoder can be directly used from AF3. * Refactoring nit. * Fix init * Fix some failing tests * Fix AF3 & MF and add batching tests * Fix audio embedding masking (bad post length) * Nits and remove since same as GLM was bug in post length computation * Simplify MF as AF3, and style checks. * New config after merge and modular update. * Address music flamingo tests, and some cleanup. * style check * Regenerate config. * Update fixtures. * Nits * Nit * Improve RoTE config * Refine MusicFlamingo rotary time handling * Simplification, and update AF3 processor for better modular * Fix torch export * Simplify modular, including upstreaming input_ids input to get_audio_features * Remove upstreaming of input_ids to get_audio_features, and remove audio_rotary_dim. * Switch to MoonshineRotaryEmbedding, and cleanup. * Remove hardcoded MusicFlamingo partial_rotary_factor * Update fixtures * Compile re.sub * Update src/transformers/models/musicflamingo/modular_musicflamingo.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/musicflamingo/modular_musicflamingo.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Style * Update fixtures. * Conditional torch import for processor. --------- Co-authored-by: Eric B <ebezzam@gmail.com> Co-authored-by: Eric Bezzam <4757445+ebezzam@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Music flamingo * Fix pos embeddings * Method arg docstrings * Add tests & docs * Fix AF3 dtype bug * Fix the MF performance issue * Fix pos embeddings * Fix embeddings & format * Remove external deps * Update processor token names * Cleanup * Simplify RotaryEmbedding to lang-only * Reuse AF3 config classes * Trim+rename rotary embedding * Call parent _init_weights first and drop rotary einsum * Precompute rotary cache at init * Use modular processor pattern for MusicFlamingo * Remove audio-only inference example * Refactor Audio Feature Casting Path * Clarify private source repo * Clean up modular * Move config to modular * Formatting * Remove dummy * Derive musicflamingo timing and rotary config * Llama style rotary embeddings * Added reproducer comments * Expose _init_weights for modular. * Satisfy repo checks * Align MusicFlamingo rotary with Llama style * Move MusicFlamingo _init_weights to encoder * Keep old behavior * Move MusicFlamingo rotary settings into encoder rope_parameters * Use AutoConfig in AF3/MF * Align MusicFlamingo RoTE with Llama RoPE conventions * Update outdated fixtures * init_weights without changing others * FIx import * Remove backward compat * Regenerate modeling for MF * Fix AF3 batch inference bug * Simplify config and nit. * Conform more to transformers convention, e.g. removing unused code paths. * Add another possible AF3 prefix. * Use auto_docstring and update docstrings. * Nits * Nit for review * Shift RoTE to main model so that encoder can be directly used from AF3. * Refactoring nit. * Fix init * Fix some failing tests * Fix AF3 & MF and add batching tests * Fix audio embedding masking (bad post length) * Nits and remove since same as GLM was bug in post length computation * Simplify MF as AF3, and style checks. * New config after merge and modular update. * Address music flamingo tests, and some cleanup. * style check * Regenerate config. * Update fixtures. * Nits * Nit * Improve RoTE config * Refine MusicFlamingo rotary time handling * Simplification, and update AF3 processor for better modular * Fix torch export * Simplify modular, including upstreaming input_ids input to get_audio_features * Remove upstreaming of input_ids to get_audio_features, and remove audio_rotary_dim. * Switch to MoonshineRotaryEmbedding, and cleanup. * Remove hardcoded MusicFlamingo partial_rotary_factor * Update fixtures * Compile re.sub * Update src/transformers/models/musicflamingo/modular_musicflamingo.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/musicflamingo/modular_musicflamingo.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Style * Update fixtures. * Conditional torch import for processor. --------- Co-authored-by: Eric B <ebezzam@gmail.com> Co-authored-by: Eric Bezzam <4757445+ebezzam@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Music flamingo * Fix pos embeddings * Method arg docstrings * Add tests & docs * Fix AF3 dtype bug * Fix the MF performance issue * Fix pos embeddings * Fix embeddings & format * Remove external deps * Update processor token names * Cleanup * Simplify RotaryEmbedding to lang-only * Reuse AF3 config classes * Trim+rename rotary embedding * Call parent _init_weights first and drop rotary einsum * Precompute rotary cache at init * Use modular processor pattern for MusicFlamingo * Remove audio-only inference example * Refactor Audio Feature Casting Path * Clarify private source repo * Clean up modular * Move config to modular * Formatting * Remove dummy * Derive musicflamingo timing and rotary config * Llama style rotary embeddings * Added reproducer comments * Expose _init_weights for modular. * Satisfy repo checks * Align MusicFlamingo rotary with Llama style * Move MusicFlamingo _init_weights to encoder * Keep old behavior * Move MusicFlamingo rotary settings into encoder rope_parameters * Use AutoConfig in AF3/MF * Align MusicFlamingo RoTE with Llama RoPE conventions * Update outdated fixtures * init_weights without changing others * FIx import * Remove backward compat * Regenerate modeling for MF * Fix AF3 batch inference bug * Simplify config and nit. * Conform more to transformers convention, e.g. removing unused code paths. * Add another possible AF3 prefix. * Use auto_docstring and update docstrings. * Nits * Nit for review * Shift RoTE to main model so that encoder can be directly used from AF3. * Refactoring nit. * Fix init * Fix some failing tests * Fix AF3 & MF and add batching tests * Fix audio embedding masking (bad post length) * Nits and remove since same as GLM was bug in post length computation * Simplify MF as AF3, and style checks. * New config after merge and modular update. * Address music flamingo tests, and some cleanup. * style check * Regenerate config. * Update fixtures. * Nits * Nit * Improve RoTE config * Refine MusicFlamingo rotary time handling * Simplification, and update AF3 processor for better modular * Fix torch export * Simplify modular, including upstreaming input_ids input to get_audio_features * Remove upstreaming of input_ids to get_audio_features, and remove audio_rotary_dim. * Switch to MoonshineRotaryEmbedding, and cleanup. * Remove hardcoded MusicFlamingo partial_rotary_factor * Update fixtures * Compile re.sub * Update src/transformers/models/musicflamingo/modular_musicflamingo.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/musicflamingo/modular_musicflamingo.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Style * Update fixtures. * Conditional torch import for processor. --------- Co-authored-by: Eric B <ebezzam@gmail.com> Co-authored-by: Eric Bezzam <4757445+ebezzam@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
This PR adds support for Music Flamingo, NVIDIA's open large audio-language model designed for deep music understanding and reasoning.
Built on Audio Flamingo 3, Music Flamingo specializes in music analysis and long-form audio reasoning, extending maximum audio support to 20 minutes (vs. 10 minutes in AF3) via Rotary Time Embeddings (RoTE). It also adds a more comprehensive music-focused system prompt, introduces audio boundary tokens (
<|sound_bos|>,<|sound_eos|>) for better audio sequence modeling.It introduces:
MusicFlamingoForConditionalGenerationmodel classMusicFlamingoProcessorfor preprocessing text + audioMusicFlamingoEncoderwith RoTE for extended temporal modelingMusic Flamingo can be loaded directly from the Hugging Face Hub: