Add Music Flamingo support to Audio Flamingo 3 by lashahub · Pull Request #43458 · huggingface/transformers

lashahub · 2026-01-24T04:02:17Z

This PR adds support for Music Flamingo to Audio Flamingo 3, NVIDIA's open large audio-language model designed for deep music understanding and reasoning.

Paper: Music Flamingo: Scaling Music Understanding in Audio Language Models
Model Weights: nvidia/music-flamingo-2601-hf
Demo: Music Flamingo Demo
Project Page: NVIDIA Research

Built on Audio Flamingo 3, Music Flamingo specializes in music analysis and long-form audio reasoning, extending maximum audio support to 20 minutes (vs. 10 minutes in AF3) via Rotary Time Embeddings (RoTE). It also adds a more comprehensive music-focused system prompt, introduces audio boundary tokens (<|sound_bos|>, <|sound_eos|>) for better audio sequence modeling.

Music Flamingo can be loaded directly from the Hugging Face Hub:

from transformers import AutoModel, AutoProcessor

model_id = "nvidia/music-flamingo-2601-hf"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModel.from_pretrained(model_id, device_map="auto", dtype="bfloat16")

conversation = [{
    "role": "user",
    "content": [
        {"type": "text", "text": "Describe this track in detail - genre, tempo, key, instruments, and mood."},
        {"type": "audio", "path": "song.mp3"}
    ]
}]

inputs = processor.apply_chat_template(conversation, tokenize=True, add_generation_prompt=True, return_dict=True).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=500)
print(processor.batch_decode(outputs[:, inputs.input_ids.shape[1]:], skip_special_tokens=True))
# Example: "This energetic Eurodance track at 150 BPM in E major features bright synth arpeggios..."

github-actions · 2026-01-24T04:03:19Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: audioflamingo3

lashahub added 16 commits December 24, 2025 10:59

Music flamingo

caf33f1

Fix pos embeddings

b69a9d1

Merge branch 'huggingface:main' into main

b4acc17

Method arg docstrings

cf1e9bc

Add tests & docs

44e801b

Merge branch 'huggingface:main' into main

f0956e3

Fix AF3 dtype bug

c973445

Fix the MF performance issue

e3a17fb

Fix pos embeddings

627dee8

Merge branch 'main' of https://github.com/lashahub/transformers

e9df30d

Fix embeddings & format

4c48132

Remove external deps

d67c114

Update processor token names

aedd341

Migrate MF to AF3

b78da82

MF tests in AF3

ebf3188

Merge branch 'main' of https://github.com/lashahub/transformers

6469101

ebezzam mentioned this pull request Jan 28, 2026

Add Music Flamingo #43538

Merged

ebezzam self-assigned this Jan 28, 2026

ebezzam closed this Feb 10, 2026

lashahub deleted the mf branch April 4, 2026 20:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Music Flamingo support to Audio Flamingo 3#43458

Add Music Flamingo support to Audio Flamingo 3#43458
lashahub wants to merge 16 commits intohuggingface:mainfrom
lashahub:mf

lashahub commented Jan 24, 2026

Uh oh!

github-actions Bot commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lashahub commented Jan 24, 2026

Uh oh!

github-actions Bot commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants