Skip to content

Add Music Flamingo#43538

Merged
ebezzam merged 92 commits intohuggingface:mainfrom
lashahub:modular-mf
Mar 30, 2026
Merged

Add Music Flamingo#43538
ebezzam merged 92 commits intohuggingface:mainfrom
lashahub:modular-mf

Conversation

@lashahub
Copy link
Copy Markdown
Contributor

This PR adds support for Music Flamingo, NVIDIA's open large audio-language model designed for deep music understanding and reasoning.

Built on Audio Flamingo 3, Music Flamingo specializes in music analysis and long-form audio reasoning, extending maximum audio support to 20 minutes (vs. 10 minutes in AF3) via Rotary Time Embeddings (RoTE). It also adds a more comprehensive music-focused system prompt, introduces audio boundary tokens (<|sound_bos|>, <|sound_eos|>) for better audio sequence modeling.

It introduces:

  • MusicFlamingoForConditionalGeneration model class
  • MusicFlamingoProcessor for preprocessing text + audio
  • MusicFlamingoEncoder with RoTE for extended temporal modeling
  • Configuration, modeling, and processing utilities
  • Tests and docs

Music Flamingo can be loaded directly from the Hugging Face Hub:

from transformers import MusicFlamingoForConditionalGeneration, AutoProcessor

processor = AutoProcessor.from_pretrained("nvidia/music-flamingo-2601-hf")
model = MusicFlamingoForConditionalGeneration.from_pretrained("nvidia/music-flamingo-2601-hf", device_map="auto")

conversation = [{
    "role": "user",
    "content": [
        {"type": "text", "text": "Describe this track in detail - genre, tempo, key, instruments, and mood."},
        {"type": "audio", "path": "song.mp3"}
    ]
}]

inputs = processor.apply_chat_template(conversation, tokenize=True, add_generation_prompt=True, return_dict=True).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=500)
print(processor.batch_decode(outputs[:, inputs.input_ids.shape[1]:], skip_special_tokens=True))
# Example: "This energetic Eurodance track at 150 BPM in E major features bright synth arpeggios..."

@ebezzam
Copy link
Copy Markdown
Contributor

ebezzam commented Jan 28, 2026

Note: "duplicate" of #43458
but keeping both open as we decide which is best approach: updating Audio Flamingo 3 or new model

cc @Rocketknight1

Copy link
Copy Markdown
Contributor

@ebezzam ebezzam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lashahub and @Sreyan88 thanks for your patience! Here's an initial review to start our iteration 🤗

Main points:

  • Can you use modular for generating the configuration and processing files?
  • Can you trim the unused code paths for your new rotary embedding object?

Thanks!

Comment thread docs/source/en/model_doc/musicflamingo.md Outdated
Comment thread docs/source/en/model_doc/musicflamingo.md Outdated
Comment thread docs/source/en/model_doc/musicflamingo.md Outdated
Comment thread src/transformers/models/audioflamingo3/modular_audioflamingo3.py Outdated
Comment thread src/transformers/models/musicflamingo/convert_musicflamingo_to_hf.py Outdated
Comment thread src/transformers/models/musicflamingo/modular_musicflamingo.py Outdated
Comment thread src/transformers/models/musicflamingo/modular_musicflamingo.py Outdated
Comment thread src/transformers/models/musicflamingo/modular_musicflamingo.py Outdated
Comment thread src/transformers/models/musicflamingo/modular_musicflamingo.py Outdated
Comment thread src/transformers/models/musicflamingo/processing_musicflamingo.py
Copy link
Copy Markdown
Contributor

@ebezzam ebezzam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lashahub thanks for updating the modular file! Here are some more comments to iterate on 🤗

Comment thread src/transformers/models/musicflamingo/convert_musicflamingo_to_hf.py Outdated
Comment thread src/transformers/models/musicflamingo/modular_musicflamingo.py Outdated
Comment thread src/transformers/models/musicflamingo/modular_musicflamingo.py Outdated
Comment thread src/transformers/models/musicflamingo/modular_musicflamingo.py Outdated
Comment thread src/transformers/models/musicflamingo/modular_musicflamingo.py
Comment thread src/transformers/models/musicflamingo/modular_musicflamingo.py Outdated


# classes
class MusicFlamingoRotaryEmbedding(Module):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you double check that LlamaRotaryEmbedding can't be used here?

class LlamaRotaryEmbedding(nn.Module):

If not, if you can try to imitate its style/structure, namely taking as input the config and (optionally) the device

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LlamaRotaryEmbedding isn't a drop-in here (MusicFlamingo uses axial/time RoTE + extra time modulation). I still refactored our rotary class to Llama-like structure (config + init helper), while preserving the exact original math so outputs stay unchanged.

Comment thread src/transformers/models/musicflamingo/modular_musicflamingo.py Outdated
Comment thread src/transformers/models/musicflamingo/modular_musicflamingo.py Outdated
Comment thread src/transformers/models/musicflamingo/modular_musicflamingo.py Outdated
@ebezzam
Copy link
Copy Markdown
Contributor

ebezzam commented Mar 30, 2026

run-slow: musicflamingo

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/musicflamingo"]
quantizations: []

@ebezzam
Copy link
Copy Markdown
Contributor

ebezzam commented Mar 30, 2026

@ArthurZucker thanks for the review!

@lashahub fyi I've recomputed the fixtures, as we discussed with @Sreyan88 to use the expected outputs with the checkpoint+code at merge (since there isn't a public reference checkpoint)

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 3b23941c workflow commit (merge commit)
PR cd693fc5 branch commit (from PR)
main 62a8e128 base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

@github-actions
Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43538&sha=cd693f

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: audioflamingo3, auto, glmasr, musicflamingo

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: audioflamingo3, auto, glmasr, musicflamingo

@ebezzam ebezzam enabled auto-merge March 30, 2026 16:53
@github-actions
Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43538&sha=d0d18f

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: audioflamingo3, auto, glmasr, musicflamingo

@ebezzam ebezzam added this pull request to the merge queue Mar 30, 2026
Merged via the queue into huggingface:main with commit a9c6700 Mar 30, 2026
29 checks passed
sirzechs66 pushed a commit to sirzechs66/transformers that referenced this pull request Mar 31, 2026
* Music flamingo

* Fix pos embeddings

* Method arg docstrings

* Add tests & docs

* Fix AF3 dtype bug

* Fix the MF performance issue

* Fix pos embeddings

* Fix embeddings & format

* Remove external deps

* Update processor token names

* Cleanup

* Simplify RotaryEmbedding to lang-only

* Reuse AF3 config classes

* Trim+rename rotary embedding

* Call parent _init_weights first and drop rotary einsum

* Precompute rotary cache at init

* Use modular processor pattern for MusicFlamingo

* Remove audio-only inference example

* Refactor Audio Feature Casting Path

* Clarify private source repo

* Clean up modular

* Move config to modular

* Formatting

* Remove dummy

* Derive musicflamingo timing and rotary config

* Llama style rotary embeddings

* Added reproducer comments

* Expose _init_weights for modular.

* Satisfy repo checks

* Align MusicFlamingo rotary with Llama style

* Move MusicFlamingo _init_weights to encoder

* Keep old behavior

* Move MusicFlamingo rotary settings into encoder rope_parameters

* Use AutoConfig in AF3/MF

* Align MusicFlamingo RoTE with Llama RoPE conventions

* Update outdated fixtures

* init_weights without changing others

* FIx import

* Remove backward compat

* Regenerate modeling for MF

* Fix AF3 batch inference bug

* Simplify config and nit.

* Conform more to transformers convention, e.g. removing unused code paths.

* Add another possible AF3 prefix.

* Use auto_docstring and update docstrings.

* Nits

* Nit for review

* Shift RoTE to main model so that encoder can be directly used from AF3.

* Refactoring nit.

* Fix init

* Fix some failing tests

* Fix AF3 & MF and add batching tests

* Fix audio embedding masking (bad post length)

* Nits and remove since same as GLM was bug in post length computation

* Simplify MF as AF3, and style checks.

* New config after merge and modular update.

* Address music flamingo tests, and some cleanup.

* style check

* Regenerate config.

* Update fixtures.

* Nits

* Nit

* Improve RoTE config

* Refine MusicFlamingo rotary time handling

* Simplification, and update AF3 processor for better modular

* Fix torch export

* Simplify modular, including upstreaming input_ids input to get_audio_features

* Remove upstreaming of input_ids to get_audio_features, and remove audio_rotary_dim.

* Switch to MoonshineRotaryEmbedding, and cleanup.

* Remove hardcoded MusicFlamingo partial_rotary_factor

* Update fixtures

* Compile re.sub

* Update src/transformers/models/musicflamingo/modular_musicflamingo.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/musicflamingo/modular_musicflamingo.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Style

* Update fixtures.

* Conditional torch import for processor.

---------

Co-authored-by: Eric B <ebezzam@gmail.com>
Co-authored-by: Eric Bezzam <4757445+ebezzam@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
SangbumChoi added a commit to SangbumChoi/transformers that referenced this pull request Apr 4, 2026
* Music flamingo

* Fix pos embeddings

* Method arg docstrings

* Add tests & docs

* Fix AF3 dtype bug

* Fix the MF performance issue

* Fix pos embeddings

* Fix embeddings & format

* Remove external deps

* Update processor token names

* Cleanup

* Simplify RotaryEmbedding to lang-only

* Reuse AF3 config classes

* Trim+rename rotary embedding

* Call parent _init_weights first and drop rotary einsum

* Precompute rotary cache at init

* Use modular processor pattern for MusicFlamingo

* Remove audio-only inference example

* Refactor Audio Feature Casting Path

* Clarify private source repo

* Clean up modular

* Move config to modular

* Formatting

* Remove dummy

* Derive musicflamingo timing and rotary config

* Llama style rotary embeddings

* Added reproducer comments

* Expose _init_weights for modular.

* Satisfy repo checks

* Align MusicFlamingo rotary with Llama style

* Move MusicFlamingo _init_weights to encoder

* Keep old behavior

* Move MusicFlamingo rotary settings into encoder rope_parameters

* Use AutoConfig in AF3/MF

* Align MusicFlamingo RoTE with Llama RoPE conventions

* Update outdated fixtures

* init_weights without changing others

* FIx import

* Remove backward compat

* Regenerate modeling for MF

* Fix AF3 batch inference bug

* Simplify config and nit.

* Conform more to transformers convention, e.g. removing unused code paths.

* Add another possible AF3 prefix.

* Use auto_docstring and update docstrings.

* Nits

* Nit for review

* Shift RoTE to main model so that encoder can be directly used from AF3.

* Refactoring nit.

* Fix init

* Fix some failing tests

* Fix AF3 & MF and add batching tests

* Fix audio embedding masking (bad post length)

* Nits and remove since same as GLM was bug in post length computation

* Simplify MF as AF3, and style checks.

* New config after merge and modular update.

* Address music flamingo tests, and some cleanup.

* style check

* Regenerate config.

* Update fixtures.

* Nits

* Nit

* Improve RoTE config

* Refine MusicFlamingo rotary time handling

* Simplification, and update AF3 processor for better modular

* Fix torch export

* Simplify modular, including upstreaming input_ids input to get_audio_features

* Remove upstreaming of input_ids to get_audio_features, and remove audio_rotary_dim.

* Switch to MoonshineRotaryEmbedding, and cleanup.

* Remove hardcoded MusicFlamingo partial_rotary_factor

* Update fixtures

* Compile re.sub

* Update src/transformers/models/musicflamingo/modular_musicflamingo.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/musicflamingo/modular_musicflamingo.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Style

* Update fixtures.

* Conditional torch import for processor.

---------

Co-authored-by: Eric B <ebezzam@gmail.com>
Co-authored-by: Eric Bezzam <4757445+ebezzam@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
@lashahub lashahub deleted the modular-mf branch April 4, 2026 20:28
sirzechs66 pushed a commit to sirzechs66/transformers that referenced this pull request Apr 18, 2026
* Music flamingo

* Fix pos embeddings

* Method arg docstrings

* Add tests & docs

* Fix AF3 dtype bug

* Fix the MF performance issue

* Fix pos embeddings

* Fix embeddings & format

* Remove external deps

* Update processor token names

* Cleanup

* Simplify RotaryEmbedding to lang-only

* Reuse AF3 config classes

* Trim+rename rotary embedding

* Call parent _init_weights first and drop rotary einsum

* Precompute rotary cache at init

* Use modular processor pattern for MusicFlamingo

* Remove audio-only inference example

* Refactor Audio Feature Casting Path

* Clarify private source repo

* Clean up modular

* Move config to modular

* Formatting

* Remove dummy

* Derive musicflamingo timing and rotary config

* Llama style rotary embeddings

* Added reproducer comments

* Expose _init_weights for modular.

* Satisfy repo checks

* Align MusicFlamingo rotary with Llama style

* Move MusicFlamingo _init_weights to encoder

* Keep old behavior

* Move MusicFlamingo rotary settings into encoder rope_parameters

* Use AutoConfig in AF3/MF

* Align MusicFlamingo RoTE with Llama RoPE conventions

* Update outdated fixtures

* init_weights without changing others

* FIx import

* Remove backward compat

* Regenerate modeling for MF

* Fix AF3 batch inference bug

* Simplify config and nit.

* Conform more to transformers convention, e.g. removing unused code paths.

* Add another possible AF3 prefix.

* Use auto_docstring and update docstrings.

* Nits

* Nit for review

* Shift RoTE to main model so that encoder can be directly used from AF3.

* Refactoring nit.

* Fix init

* Fix some failing tests

* Fix AF3 & MF and add batching tests

* Fix audio embedding masking (bad post length)

* Nits and remove since same as GLM was bug in post length computation

* Simplify MF as AF3, and style checks.

* New config after merge and modular update.

* Address music flamingo tests, and some cleanup.

* style check

* Regenerate config.

* Update fixtures.

* Nits

* Nit

* Improve RoTE config

* Refine MusicFlamingo rotary time handling

* Simplification, and update AF3 processor for better modular

* Fix torch export

* Simplify modular, including upstreaming input_ids input to get_audio_features

* Remove upstreaming of input_ids to get_audio_features, and remove audio_rotary_dim.

* Switch to MoonshineRotaryEmbedding, and cleanup.

* Remove hardcoded MusicFlamingo partial_rotary_factor

* Update fixtures

* Compile re.sub

* Update src/transformers/models/musicflamingo/modular_musicflamingo.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/musicflamingo/modular_musicflamingo.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Style

* Update fixtures.

* Conditional torch import for processor.

---------

Co-authored-by: Eric B <ebezzam@gmail.com>
Co-authored-by: Eric Bezzam <4757445+ebezzam@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants