Add Music Flamingo by lashahub · Pull Request #43538 · huggingface/transformers

lashahub · 2026-01-27T17:37:34Z

This PR adds support for Music Flamingo, NVIDIA's open large audio-language model designed for deep music understanding and reasoning.

Paper: Music Flamingo: Scaling Music Understanding in Audio Language Models
Model Weights: nvidia/music-flamingo-2601-hf
Demo: Music Flamingo Demo
Project Page: NVIDIA Research

Built on Audio Flamingo 3, Music Flamingo specializes in music analysis and long-form audio reasoning, extending maximum audio support to 20 minutes (vs. 10 minutes in AF3) via Rotary Time Embeddings (RoTE). It also adds a more comprehensive music-focused system prompt, introduces audio boundary tokens (<|sound_bos|>, <|sound_eos|>) for better audio sequence modeling.

It introduces:

MusicFlamingoForConditionalGeneration model class
MusicFlamingoProcessor for preprocessing text + audio
MusicFlamingoEncoder with RoTE for extended temporal modeling
Configuration, modeling, and processing utilities
Tests and docs

Music Flamingo can be loaded directly from the Hugging Face Hub:

from transformers import MusicFlamingoForConditionalGeneration, AutoProcessor

processor = AutoProcessor.from_pretrained("nvidia/music-flamingo-2601-hf")
model = MusicFlamingoForConditionalGeneration.from_pretrained("nvidia/music-flamingo-2601-hf", device_map="auto")

conversation = [{
    "role": "user",
    "content": [
        {"type": "text", "text": "Describe this track in detail - genre, tempo, key, instruments, and mood."},
        {"type": "audio", "path": "song.mp3"}
    ]
}]

inputs = processor.apply_chat_template(conversation, tokenize=True, add_generation_prompt=True, return_dict=True).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=500)
print(processor.batch_decode(outputs[:, inputs.input_ids.shape[1]:], skip_special_tokens=True))
# Example: "This energetic Eurodance track at 150 BPM in E major features bright synth arpeggios..."

ebezzam · 2026-01-28T08:58:57Z

Note: "duplicate" of #43458
but keeping both open as we decide which is best approach: updating Audio Flamingo 3 or new model

cc @Rocketknight1

ebezzam

@lashahub and @Sreyan88 thanks for your patience! Here's an initial review to start our iteration 🤗

Main points:

Can you use modular for generating the configuration and processing files?
Can you trim the unused code paths for your new rotary embedding object?

Thanks!

ebezzam

@lashahub thanks for updating the modular file! Here are some more comments to iterate on 🤗

ebezzam · 2026-02-17T18:27:17Z

+
+
+# classes
+class MusicFlamingoRotaryEmbedding(Module):


can you double check that LlamaRotaryEmbedding can't be used here?

transformers/src/transformers/models/llama/modeling_llama.py

Line 73 in 83eb94c

class LlamaRotaryEmbedding(nn.Module):

If not, if you can try to imitate its style/structure, namely taking as input the config and (optionally) the device

LlamaRotaryEmbedding isn't a drop-in here (MusicFlamingo uses axial/time RoTE + extra time modulation). I still refactored our rotary class to Llama-like structure (config + init helper), while preserving the exact original math so outputs stay unchanged.

…dular-mf

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

ebezzam · 2026-03-30T16:35:29Z

run-slow: musicflamingo

github-actions · 2026-03-30T16:36:49Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/musicflamingo"]
quantizations: []

ebezzam · 2026-03-30T16:37:41Z

@ArthurZucker thanks for the review!

@lashahub fyi I've recomputed the fixtures, as we discussed with @Sreyan88 to use the expected outputs with the checkpoint+code at merge (since there isn't a public reference checkpoint)

github-actions · 2026-03-30T16:42:18Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	3b23941c	workflow commit (merge commit)
PR	cd693fc5	branch commit (from PR)
main	62a8e128	base commit (on `main`)

✅ No failing test specific to this PR 🎉 👏 !

github-actions · 2026-03-30T16:50:11Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43538&sha=cd693f

…dular-mf

github-actions · 2026-03-30T16:53:29Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: audioflamingo3, auto, glmasr, musicflamingo

github-actions · 2026-03-30T16:53:54Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: audioflamingo3, auto, glmasr, musicflamingo

github-actions · 2026-03-30T17:06:45Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43538&sha=d0d18f

github-actions · 2026-03-30T17:37:36Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: audioflamingo3, auto, glmasr, musicflamingo

* Music flamingo * Fix pos embeddings * Method arg docstrings * Add tests & docs * Fix AF3 dtype bug * Fix the MF performance issue * Fix pos embeddings * Fix embeddings & format * Remove external deps * Update processor token names * Cleanup * Simplify RotaryEmbedding to lang-only * Reuse AF3 config classes * Trim+rename rotary embedding * Call parent _init_weights first and drop rotary einsum * Precompute rotary cache at init * Use modular processor pattern for MusicFlamingo * Remove audio-only inference example * Refactor Audio Feature Casting Path * Clarify private source repo * Clean up modular * Move config to modular * Formatting * Remove dummy * Derive musicflamingo timing and rotary config * Llama style rotary embeddings * Added reproducer comments * Expose _init_weights for modular. * Satisfy repo checks * Align MusicFlamingo rotary with Llama style * Move MusicFlamingo _init_weights to encoder * Keep old behavior * Move MusicFlamingo rotary settings into encoder rope_parameters * Use AutoConfig in AF3/MF * Align MusicFlamingo RoTE with Llama RoPE conventions * Update outdated fixtures * init_weights without changing others * FIx import * Remove backward compat * Regenerate modeling for MF * Fix AF3 batch inference bug * Simplify config and nit. * Conform more to transformers convention, e.g. removing unused code paths. * Add another possible AF3 prefix. * Use auto_docstring and update docstrings. * Nits * Nit for review * Shift RoTE to main model so that encoder can be directly used from AF3. * Refactoring nit. * Fix init * Fix some failing tests * Fix AF3 & MF and add batching tests * Fix audio embedding masking (bad post length) * Nits and remove since same as GLM was bug in post length computation * Simplify MF as AF3, and style checks. * New config after merge and modular update. * Address music flamingo tests, and some cleanup. * style check * Regenerate config. * Update fixtures. * Nits * Nit * Improve RoTE config * Refine MusicFlamingo rotary time handling * Simplification, and update AF3 processor for better modular * Fix torch export * Simplify modular, including upstreaming input_ids input to get_audio_features * Remove upstreaming of input_ids to get_audio_features, and remove audio_rotary_dim. * Switch to MoonshineRotaryEmbedding, and cleanup. * Remove hardcoded MusicFlamingo partial_rotary_factor * Update fixtures * Compile re.sub * Update src/transformers/models/musicflamingo/modular_musicflamingo.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/musicflamingo/modular_musicflamingo.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Style * Update fixtures. * Conditional torch import for processor. --------- Co-authored-by: Eric B <ebezzam@gmail.com> Co-authored-by: Eric Bezzam <4757445+ebezzam@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

lashahub added 13 commits December 24, 2025 10:59

Music flamingo

caf33f1

Fix pos embeddings

b69a9d1

Merge branch 'huggingface:main' into main

b4acc17

Method arg docstrings

cf1e9bc

Add tests & docs

44e801b

Merge branch 'huggingface:main' into main

f0956e3

Fix AF3 dtype bug

c973445

Fix the MF performance issue

e3a17fb

Fix pos embeddings

627dee8

Merge branch 'main' of https://github.com/lashahub/transformers

e9df30d

Fix embeddings & format

4c48132

Remove external deps

d67c114

Update processor token names

aedd341

ebezzam self-assigned this Jan 28, 2026

ebezzam added Audio New model labels Jan 28, 2026

ebezzam reviewed Feb 10, 2026

View reviewed changes

lashahub added 9 commits February 10, 2026 16:58

Cleanup

87d55a9

Simplify RotaryEmbedding to lang-only

be22746

Reuse AF3 config classes

e5b4677

Trim+rename rotary embedding

d7e0bcb

Call parent _init_weights first and drop rotary einsum

74af4fa

Precompute rotary cache at init

d622368

Use modular processor pattern for MusicFlamingo

cab3937

Remove audio-only inference example

9dd94a0

Refactor Audio Feature Casting Path

767a1d5

lashahub force-pushed the modular-mf branch from fa89e88 to 767a1d5 Compare February 16, 2026 18:11

ebezzam reviewed Feb 17, 2026

View reviewed changes

Clarify private source repo

9119660

ebezzam and others added 8 commits March 30, 2026 18:14

Compile re.sub

a46aa1c

Merge branch 'modular-mf' of github.com:lashahub/transformers into mo…

836da94

…dular-mf

Update src/transformers/models/musicflamingo/modular_musicflamingo.py

1089477

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Update src/transformers/models/musicflamingo/modular_musicflamingo.py

2ea5574

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Style

999737e

Merge branch 'main' into modular-mf

1533bec

Update fixtures.

6857af9

Merge branch 'main' into modular-mf

cd693fc

ebezzam and others added 3 commits March 30, 2026 18:52

Conditional torch import for processor.

35fc31f

Merge branch 'modular-mf' of github.com:lashahub/transformers into mo…

a62e7d9

…dular-mf

Merge branch 'main' into modular-mf

d0d18f2

ebezzam enabled auto-merge March 30, 2026 16:53

Merge branch 'main' into modular-mf

47fa938

ebezzam added this pull request to the merge queue Mar 30, 2026

Merged via the queue into huggingface:main with commit a9c6700 Mar 30, 2026
29 checks passed

lashahub deleted the modular-mf branch April 4, 2026 20:28

lashahub mentioned this pull request Apr 5, 2026

Update MusicFlamingo and add AudioFlamingoNext vllm-project/vllm#39011

Open

5 tasks

Conversation

lashahub commented Jan 27, 2026

Uh oh!

ebezzam commented Jan 28, 2026

Uh oh!

ebezzam left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ebezzam left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ebezzam Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

lashahub Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ebezzam commented Mar 30, 2026

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

ebezzam commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Mar 30, 2026

CI Results

Commit Info

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ebezzam commented Mar 30, 2026 •

edited

Loading