docs: Musicgen melody model card#38955
Open
AshAnand34 wants to merge 3 commits intohuggingface:mainfrom
Open
Conversation
1 task
stevhliu
reviewed
Jun 26, 2025
| Transformers supports both mono (1-channel) and stereo (2-channel) variants of MusicGen Melody. The mono channel versions generate a single set of codebooks. The stereo versions generate 2 sets of codebooks, 1 for each channel (left/right), and each set of codebooks is decoded independently through the audio compression model. The audio streams for each channel are combined to give the final stereo output. | ||
| # MusicGen Melody | ||
|
|
||
| [MusicGen Melody](https://huggingface.co/papers/2306.05284) is a single-stage, auto-regressive Transformer model designed for high-quality music generation, conditioned on both text and audio prompts. Unlike its predecessor, MusicGen Melody uses the audio prompt as a direct melodic guide, allowing for more precise control over the generated music. |
Member
There was a problem hiding this comment.
Suggested change
| [MusicGen Melody](https://huggingface.co/papers/2306.05284) is a single-stage, auto-regressive Transformer model designed for high-quality music generation, conditioned on both text and audio prompts. Unlike its predecessor, MusicGen Melody uses the audio prompt as a direct melodic guide, allowing for more precise control over the generated music. | |
| [MusicGen Melody](https://huggingface.co/papers/2306.05284) builds on top of the [MusicGen](./musicgen) model by adding a melody-guided generation approach to enable more controllable audio generation. The model is conditioned on both input text and chromagram. A chromagram better captures the harmonic and melodic features of music. | |
| Unlike MusicGen, MusicGen Melody uses the audio prompt as a conditional signal for the generated audio sample and the conditional text and audio signals are concatenated to the decoder's hidden states. |
| [MusicGen Melody](https://huggingface.co/papers/2306.05284) is a single-stage, auto-regressive Transformer model designed for high-quality music generation, conditioned on both text and audio prompts. Unlike its predecessor, MusicGen Melody uses the audio prompt as a direct melodic guide, allowing for more precise control over the generated music. | ||
|
|
||
| #### Audio Conditional Generation | ||
| You can find all the original [MusicGen Melody](https://huggingface.co/models?sort=downloads&search=facebook%2Fmusicgen) checkpoints on the Hugging Face Hub. |
Member
There was a problem hiding this comment.
Suggested change
| You can find all the original [MusicGen Melody](https://huggingface.co/models?sort=downloads&search=facebook%2Fmusicgen) checkpoints on the Hugging Face Hub. | |
| You can find all the original MusicGen Melody checkpoints under the [AI at Meta](https://huggingface.co/facebook/models?search=musicgen-melody) organization. |
Comment on lines
+31
to
+32
| > [!TIP] | ||
| > Click on the MusicGen Melody models in the right sidebar for more examples of how to apply the model to various music generation tasks. |
Member
There was a problem hiding this comment.
Suggested change
| > [!TIP] | |
| > Click on the MusicGen Melody models in the right sidebar for more examples of how to apply the model to various music generation tasks. | |
| > [!TIP] | |
| > This model was contributed by [ylacombe](https://huggingface.co/ylacombe). | |
| > | |
| > Click on the MusicGen Melody models in the right sidebar for more examples of how to apply the model to various music generation tasks. |
| > Click on the MusicGen Melody models in the right sidebar for more examples of how to apply the model to various music generation tasks. | ||
|
|
||
| In the following examples, we load an audio file using the 🤗 Datasets library, which can be pip installed through the command below: | ||
| The example below demonstrates how to generate music conditioned on an audio melody and a text description using the [`AutoModel`] class. |
Member
There was a problem hiding this comment.
Suggested change
| The example below demonstrates how to generate music conditioned on an audio melody and a text description using the [`AutoModel`] class. | |
| The example below demonstrates how to generate music with [`Pipeline`] or the [`AutoModel`] class. |
| The audio file we are about to use is loaded as follows: | ||
| ```python | ||
| >>> from datasets import load_dataset | ||
| from transformers import pipeline |
Member
There was a problem hiding this comment.
import torch
from transformers import pipeline
pipeline = pipeline("text-to-audio", model="facebook/musicgen-melody", device=0, torch_dtype="auto")
pipeline("80s pop track with bassy drums and synth")| audio_values = model.generate(**inputs, do_sample=True, guidance_scale=3, max_new_tokens=256) | ||
| ``` | ||
|
|
||
| **Unconditional Generation** |
Member
There was a problem hiding this comment.
Suggested change
| **Unconditional Generation** | |
| - The example below demonstrates unconditional generation. |
Comment on lines
+142
to
+143
| **Generation Configuration** | ||
| You can inspect and update the model's generation configuration. |
Member
There was a problem hiding this comment.
Suggested change
| **Generation Configuration** | |
| You can inspect and update the model's generation configuration. | |
| - The generation config stores the default parameters that control the generation process such as sampling, guidance scale, and number of generated tokens. | |
| Any arguments passed to the [`~GenerationMixin.generate`] method supersedes the parameters in the generation config. |
Comment on lines
+155
to
+158
| ### Other Information | ||
| - **Checkpoint Conversion**: Convert original checkpoints using the script at `src/transformers/models/musicgen_melody/convert_musicgen_melody_transformers.py`. | ||
| - **`head_mask`**: The `head_mask` argument is only effective with `attn_implementation="eager"`. | ||
| - **Sampling**: For best results, use sampling (`do_sample=True`). |
Member
There was a problem hiding this comment.
Suggested change
| ### Other Information | |
| - **Checkpoint Conversion**: Convert original checkpoints using the script at `src/transformers/models/musicgen_melody/convert_musicgen_melody_transformers.py`. | |
| - **`head_mask`**: The `head_mask` argument is only effective with `attn_implementation="eager"`. | |
| - **Sampling**: For best results, use sampling (`do_sample=True`). | |
| - The `head_mask` argument is only effective with `attn_implementation="eager"`. | |
| - For best results, set `do_sample=True`. |
| - **`head_mask`**: The `head_mask` argument is only effective with `attn_implementation="eager"`. | ||
| - **Sampling**: For best results, use sampling (`do_sample=True`). | ||
|
|
||
| ## Model Structure |
Member
There was a problem hiding this comment.
Remove this section and replace with the below. Remember each code snippet should be indented under its list item
Suggested change
| ## Model Structure | |
| - [`MusicgenMelodyForCausalLM`] can be used as a standalone decoder model. Load it by specifying the correct config or accessing it through the `.decoder` attribute of [`MusicgenMelodyForConditionalGeneration`]. | |
| [`MusicgenMelodyForConditionalGeneration`] can be used as a composite model that includes the text and audio encoder. |
| # Option 2: Access the decoder from the composite model | ||
| model = MusicgenMelodyForConditionalGeneration.from_pretrained("facebook/musicgen-melody") | ||
| decoder = model.decoder | ||
| ``` |
Member
There was a problem hiding this comment.
Add a few more notes:
- Ensure you're using a 32kHz checkpoint of the Encodec model because MusicGen was trained on it.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
This pull request updates the documentation for the MusicGen Melody model in
docs/source/en/model_doc/musicgen_melody.md. The changes aim to simplify and enhance the clarity of the documentation by restructuring the content, adding examples, and improving formatting.Documentation Improvements
Overview and Structure
Examples and Usage
Formatting and Accessibility
#36979
Before submitting
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@stevhliu