Add Youtu-LLM model by LuJunru · Pull Request #43166 · huggingface/transformers

LuJunru · 2026-01-08T10:02:59Z

What does this PR do?

This PR adds the implementation for the released Youtu-LLM model. The model has the following features:

Type: Autoregressive Causal Language Models with Dense MLA
Release versions: Base and Instruct

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@ArthurZucker @Cyrilvallez

LuJunru · 2026-01-08T11:01:59Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43166&sha=5dab39

Hi @ArthurZucker @Cyrilvallez

May I ask if it is possible to concentrate the test only on Youtu-LLM (the new model)? The summary here seems report errors raised by other models.

junru

…ition_embedding in DiT (#43068) * qwen2_5_omni: make max_mel_frames an inference-time knob * not fail with raising ValueError, instead make it continue to run by choosing a target_duration that's capped and aligned * added unit tests for Token2WavShape shape mismatch Signed-off-by: Dong Wang <dongw2019@gmail.com> * make fixup * remove unit test which takes too much GPU memory Signed-off-by: Dong Wang <dongw2019@gmail.com> * reduce gpu memory usage from the unit test * addressed comments Signed-off-by: Dong Wang <dongw2019@gmail.com> --------- Signed-off-by: Dong Wang <dongw2019@gmail.com>

LuJunru · 2026-01-09T01:51:26Z

Hi @ArthurZucker @Cyrilvallez

It seems Youtu-LLM-related codes have passed the auto review. The remaining check fails on other models.

molbap · 2026-01-12T17:41:48Z

run-slow: youtu_llm

github-actions · 2026-01-12T17:43:01Z

This comment contains run-slow, running the specified jobs:

models: ["models/youtu_llm"]
quantizations: []

github-actions · 2026-01-12T18:06:49Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

molbap

Seems clean, good modular file with simply Llama + MLA, beautiful. Asked a few questions, let me know and I'll re-review!

molbap · 2026-01-12T17:32:38Z

is the official name YoutuLLM or Youtu as in the prefixes here?

We chose to use Youtu as the prefix of modules, as it is more suitable for extension (e.g., we plan to introduce YoutuVL in near future). Youtu-LLM is rather a brand name.

then everything that has to see with model name (youtu) should be named as such, like the model directory

molbap · 2026-01-12T17:39:42Z

+
+        model_sdpa = YoutuForCausalLM.from_pretrained(
+            "tencent/Youtu-LLM-2B-Base",
+            dtype=torch.float16,


let's make sdpa explicit here

molbap · 2026-01-12T19:28:58Z

+
+
+class YoutuModel(LlamaModel):
+    _keys_to_ignore_on_load_unexpected = [""]


is this to remove the Llama attribute? if so, ok

For the current version of the model (Youtu-LLM-2B family), this line of code could be removed.

molbap · 2026-01-12T19:29:57Z

+    @require_torch_accelerator
+    @pytest.mark.torch_compile_test
+    @require_read_token
+    def test_compile_static_cache(self):


Thanks for adding an integration test! however naming-wise, seems to measure dynamic and static Cache no? By the way, could we have a simple no-compile integration test that works in the simplest setting, just to avoid regressions?

We have provided inference tests below based on no-compile dynamic cache and no-compile static cache. Basically, I implemented this test function by referencing test function of DeepSeek V3.

sure, can we update the name though to make it more clear and separate in two tests? that way if it breaks at some point it's easier to debug

Sure, is there any official examples that I can follow up?

molbap · 2026-01-12T19:43:41Z

+    @parameterized.expand([("random",), ("same",)])
+    @unittest.skip("Youtu-LLM is not compatible with assisted decoding")
+    def test_assisted_decoding_matches_greedy_search(self, assistant_type):
+        pass
+
+    @unittest.skip("Youtu-LLM is not compatible with assisted decoding")
+    def test_prompt_lookup_decoding_matches_greedy_search(self, assistant_type):
+        pass
+
+    @unittest.skip("Youtu-LLM is not compatible with assisted decoding")
+    def test_assisted_decoding_sample(self):
+        pass
+
+    @unittest.skip("Youtu-LLM uses MLA so it is not compatible with the standard cache format")
+    def test_beam_search_generate_dict_outputs_use_cache(self):
+        pass
+
+    @unittest.skip("Youtu-LLM uses MLA so it is not compatible with the standard cache format")
+    def test_greedy_generate_dict_outputs_use_cache(self):
+        pass
+
+    @unittest.skip(reason="SDPA can't dispatch on flash due to unsupported head dims")
+    def test_sdpa_can_dispatch_on_flash(self):
+        pass
+
+    @unittest.skip(reason="Youtu-LLM is not suitable for testing with extreme small vocabulary")
+    def test_resize_tokens_embeddings(self):
+        pass


are all these tests indeed not working?

Yes, exactly.

Let's check if we can fix the majority by moving the tests under the CausalLM wrapper classes

LuJunru · 2026-01-13T11:24:33Z

Hi @molbap

I've updated a new version of code according to the discussion aforementioned. Can you help start a new solo test of Youtu-LLM (run-slow: youtu_llm)?

LuJunru · 2026-01-27T06:27:59Z

Hi @vasqu

I have fixed most of the issues mentioned above, please check again.

There is one specific issue related to check_docstring.py. After using make fix-repo, the original docstring will be changed from:

    Args:
            vocab_size (`int`, *optional*, defaults to 128256):
                Vocabulary size of the Deep model. Defines the number of different tokens that can be represented by the
                `inputs_ids` passed when calling [`YoutuModel`]
            hidden_size (`int`, *optional*, defaults to 2048):
                Dimension of the hidden representations.
            intermediate_size (`int`, *optional*, defaults to 6144):
                Dimension of the MLP representations.
            num_hidden_layers (`int`, *optional*, defaults to 32):
                Number of hidden layers in the Transformer decoder.
            num_attention_heads (`int`, *optional*, defaults to 16):
                Number of attention heads for each attention layer in the Transformer decoder.
            num_key_value_heads (`int`, *optional*, defaults to 16):
                In MLA, num_key_value_heads=num_attention_heads.
            kv_lora_rank (`int`, *optional*, defaults to 512):
                Rank of the LoRA matrices for key and value projections.
            q_lora_rank (`int`, *optional*, defaults to 1536):
                Rank of the LoRA matrices for query projections.
            qk_rope_head_dim (`int`, *optional*, defaults to 64):
                Dimension of the query/key heads that use rotary position embeddings.
            v_head_dim (`int`, *optional*, defaults to 128):
                Dimension of the value heads.
            qk_nope_head_dim (`int`, *optional*, defaults to 128):
                Dimension of the query/key heads that don't use rotary position embeddings.
            hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
                The non-linear activation function (function or string) in the decoder.
            max_position_embeddings (`int`, *optional*, defaults to 131072):
                The maximum sequence length that this model might ever be used with.
            initializer_range (`float`, *optional*):
                The standard deviation of the truncated_normal_initializer for initializing all weight matrices, except embedding matrices.
            embedding_initializer_range (`float`, *optional*):
                The standard deviation of the truncated_normal_initializer for initializing all embedding matrices.
            rms_norm_eps (`float`, *optional*, defaults to 1e-06):
                The epsilon used by the rms normalization layers.
            use_cache (`bool`, *optional*, defaults to `True`):
                Whether or not the model should return the last key/values attentions (not used by all models). Only
                relevant if `config.is_decoder=True`.
            pad_token_id (`int`, *optional*):
                Padding token id.
            bos_token_id (`int`, *optional*, defaults to 128000):
                Beginning of stream token id.
            eos_token_id (`int`, *optional*, defaults to 128001):
                End of stream token id.
            tie_word_embeddings (`bool`, *optional*, defaults to `True`):
                Whether to tie weight embeddings
            rope_parameters (`RopeParameters`, *optional*):
                Dictionary containing the configuration parameters for the RoPE embeddings. The dictionary should contain
                a value for `rope_theta` and optionally parameters used for scaling in case you want to use RoPE
                with longer `max_position_embeddings`.
            rope_interleave (`bool`, *optional*, defaults to `True`):
                Whether to interleave the rotary position embeddings.
            attention_bias (`bool`, defaults to `False`, *optional*, defaults to `False`):
                Whether to use a bias in the query, key, value and output projection layers during self-attention.
            attention_dropout (`float`, *optional*, defaults to 0.0):
                The dropout ratio for the attention probabilities.

to

    Args:
            vocab_size (`int | None`, *optional*, defaults to 128256): <fill_docstring>
            hidden_size (`int | None`, *optional*, defaults to 2048): <fill_docstring>
            intermediate_size (`int | None`, *optional*, defaults to 6144): <fill_docstring>
            num_hidden_layers (`int | None`, *optional*, defaults to 32): <fill_docstring>
            num_attention_heads (`int | None`, *optional*, defaults to 16): <fill_docstring>
            num_key_value_heads (`int | None`, *optional*, defaults to 16): <fill_docstring>
            kv_lora_rank (`int | None`, *optional*, defaults to 512): <fill_docstring>
            q_lora_rank (`int | None`, *optional*, defaults to 1536): <fill_docstring>
            qk_rope_head_dim (`int | None`, *optional*, defaults to 64): <fill_docstring>
            v_head_dim (`int | None`, *optional*, defaults to 128): <fill_docstring>
            qk_nope_head_dim (`int | None`, *optional*, defaults to 128): <fill_docstring>
            hidden_act (`str | None`, *optional*, defaults to `"silu"`): <fill_docstring>
            max_position_embeddings (`int | None`, *optional*, defaults to 131072): <fill_docstring>
            initializer_range (`float | None`, *optional*): <fill_docstring>
            embedding_initializer_range (`float | None`, *optional*): <fill_docstring>
            rms_norm_eps (`int | None`, *optional*, defaults to 1e-06): <fill_docstring>
            use_cache (`bool | None`, *optional*, defaults to `True`): <fill_docstring>
            pad_token_id (`int | None`, *optional*): <fill_docstring>
            bos_token_id (`int | None`, *optional*, defaults to 128000): <fill_docstring>
            eos_token_id (`int | None`, *optional*, defaults to 128001): <fill_docstring>
            tie_word_embeddings (`bool | None`, *optional*, defaults to `True`): <fill_docstring>
            rope_parameters (`transformers.modeling_rope_utils.RopeParameters | dict[str, transformers.modeling_rope_utils.RopeParameters]`, *optional*): <fill_docstring>
            rope_interleave (`bool | None`, *optional*, defaults to `True`): <fill_docstring>
            attention_bias (`bool | None`, *optional*, defaults to `False`): <fill_docstring>
            attention_dropout (`float | None`, *optional*, defaults to 0.0): <fill_docstring>

I noticed many <fill_docstring> placeholder here, is this correct? Meanwhile, even though I use the updated docstring by make fix-repo, it still raise check_repository_consistency errors.

vasqu

I fixed a few things in regards to the config and tests, that resolves your issue as well it seems. Lmk if not!

Just a few last nits but approving since it's nothing major, quick checking with our slow CI in a second (might need to adjust values because of GPU differences)

vasqu · 2026-01-27T11:00:51Z

+logger = logging.get_logger(__name__)
+
+
+class YoutuConfig(DeepseekV3Config):


Sorry, that was confusing on my side --> I meant to say to add this to modular then. You can see that it will unfold the inherited attributes in the config file (which also solves the consistency issues) but better double check I haven't missed something

vasqu · 2026-01-27T11:01:18Z

+    def convert_rope_params_to_dict(self, ignore_keys_at_rope_validation: set | None = None, **kwargs):
+        raise AttributeError("Not overwritten for the Youtu model!")


Fyi, this way you can disable inheriting function from others

vasqu · 2026-01-27T11:04:02Z

+class YoutuModelTester(CausalLMModelTester):
+    if is_torch_available():
+        base_model_class = YoutuModel
+
+    def __init__(
+        self,
+        parent,
+        kv_lora_rank=16,
+        q_lora_rank=32,
+        qk_rope_head_dim=32,
+        qk_nope_head_dim=32,
+        v_head_dim=32,
+    ):
+        super().__init__(parent=parent)
+        self.kv_lora_rank = kv_lora_rank
+        self.q_lora_rank = q_lora_rank
+        self.qk_nope_head_dim = qk_nope_head_dim
+        self.qk_rope_head_dim = qk_rope_head_dim
+        self.v_head_dim = v_head_dim
+
+
+@require_torch
+class YoutuModelTest(CausalLMModelTest, unittest.TestCase):
+    model_tester_class = YoutuModelTester
+
+    def _check_past_key_values_for_generate(self, batch_size, past_key_values, seq_length, config):
+        """Needs to be overridden as youtu-llm has special MLA cache format (though we don't really use the MLA)"""
+        self.assertIsInstance(past_key_values, Cache)
+
+        # (batch, head, seq_length, head_features)
+        expected_common_shape = (
+            batch_size,
+            getattr(config, "num_key_value_heads", config.num_attention_heads),
+            seq_length,
+        )
+        expected_key_shape = expected_common_shape + (config.qk_nope_head_dim + config.qk_rope_head_dim,)
+        expected_value_shape = expected_common_shape + (config.v_head_dim,)
+
+        for layer in past_key_values.layers:
+            self.assertEqual(layer.keys.shape, expected_key_shape)
+            self.assertEqual(layer.values.shape, expected_value_shape)


Refactored this to use our causal lm class - it makes it easier for use to refactor tests in the future

vasqu · 2026-01-27T11:05:53Z

+    def tearDown(self):
+        cleanup(torch_device, gc_collect=False)


Let's also clean on setup e.g.

transformers/tests/models/llama/test_modeling_llama.py

Lines 65 to 73 in a1f63d5

def setup(self):

cleanup(torch_device, gc_collect=True)

def tearDown(self):

# TODO (joao): automatic compilation, i.e. compilation when `cache_implementation="static"` is used, leaves

# some memory allocated in the cache, which means some object is not being released properly. This causes some

# unoptimal memory usage, e.g. after certain tests a 7B model in FP16 no longer fits in a 24GB GPU.

# Investigate the root cause.

cleanup(torch_device, gc_collect=True)

(no need to copy the comment)

vasqu · 2026-01-27T11:09:51Z

It does seem we need to fix at least: FAILED tests/models/youtu/test_modeling_youtu.py::YoutuModelTest::test_config - ZeroDivisionError: float division by zero

vasqu · 2026-01-27T11:10:29Z

run-slow: youtu

github-actions · 2026-01-27T11:11:38Z

This comment contains run-slow, running the specified jobs:

models: ["models/youtu"]
quantizations: []

github-actions · 2026-01-27T11:29:12Z

CI Results

Workflow Run ⚙️

Model CI Report

❌ Failed tests

youtu:
tests/models/youtu/test_modeling_youtu.py::YoutuModelTest::test_config
tests/models/youtu/test_modeling_youtu.py::YoutuModelTest::test_sdpa_can_dispatch_on_flash

LuJunru · 2026-01-27T11:45:33Z

It does seem we need to fix at least: FAILED tests/models/youtu/test_modeling_youtu.py::YoutuModelTest::test_config - ZeroDivisionError: float division by zero

alright, i guess this was the error that led me to add conditions of hidden_size = 0 in the configuration_youtu.py previously.

vasqu · 2026-01-27T13:07:07Z

test_sdpa_can_dispatch_on_flash seems to fail because of some hidden dim incompatibility - you can either skip with the proper reason or try to adjust the values in the tester

vasqu · 2026-01-28T09:46:25Z

run-slow: youtu

github-actions · 2026-01-28T09:46:55Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, youtu

github-actions · 2026-01-28T09:47:37Z

This comment contains run-slow, running the specified jobs:

models: ["models/youtu"]
quantizations: []

github-actions · 2026-01-28T10:02:08Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

vasqu · 2026-01-28T10:17:33Z

@LuJunru I just updated some last few nits, everything passes (the other tests are unrelated) so I merged

Thanks for iterating and gz on the model addition 🤗

LuJunru · 2026-01-28T11:05:45Z

@LuJunru I just updated some last few nits, everything passes (the other tests are unrelated) so I merged

Thanks for iterating and gz on the model addition 🤗

@molbap @vasqu @xenova 🤗 Thank you for the professional suggestions! I'm going to update the usage in our official repos.

huggingface/transformers#43166

ydshieh · 2026-04-05T14:11:47Z

+if is_torch_available():
+    import torch
+
+    torch.set_float32_matmul_precision("high")


Hi @LuJunru Is any particular reason to set it to "high" here? It causes the issue described in

#45248 (comment)

To avoid test failures caused by minor numerical fluctuations across different testing environments when using different GPUs.

Thanks.

I will change it to highest which should work too. If you are curious why we need to change it, see #45252

huggingface/transformers#43166

add Youtu-LLM model

5dab39b

LuJunru mentioned this pull request Jan 8, 2026

Add Youtu-LLM model #43165

Closed

5 tasks

Merge branch 'main' into add-youtu-llm

7a87762

LuJunru and others added 7 commits January 8, 2026 19:14

add testing indicators in model test

22eaca4

upgrade code quality according to latest main branch

e2b6dcb

Merge branch 'main' into add-youtu-llm

1a8ca39

correct unnecessary tokenizer annotation

014919a

resolve conflicts

7be5052

Merge branch 'main' into add-youtu-llm

ac69b0c

LuJunru added 5 commits January 9, 2026 22:55

Merge branch 'main' into add-youtu-llm

23ca813

Merge branch 'main' into add-youtu-llm

b226ed1

Merge branch 'main' into add-youtu-llm

fab87c3

Merge branch 'main' into add-youtu-llm

d95207e

Merge branch 'main' into add-youtu-llm

f1ec281

molbap self-assigned this Jan 12, 2026

molbap self-requested a review January 12, 2026 14:13

molbap reviewed Jan 12, 2026

View reviewed changes

LuJunru and others added 5 commits January 13, 2026 17:35

modify redundant codes in modules, decompose test functions

c263536

Merge branch 'main' into add-youtu-llm

e8e1b90

fix typo

a3f0451

Merge branch 'main' into add-youtu-llm

60785e5

adapt to latest official codes

5ca5679

yonigozlan added the New model label Jan 13, 2026

vasqu added 2 commits January 27, 2026 11:47

modular

db71ebc

refactor tests

9874cfd

vasqu approved these changes Jan 27, 2026

View reviewed changes

Merge branch 'main' into add-youtu-llm

36b0739

LuJunru and others added 5 commits January 28, 2026 10:47

skip incompatible tests

5808d7b

Merge branch 'main' into add-youtu-llm

5913f25

rerun fix-repo

acbb77b

some last fixes

082bd85

Merge branch 'main' into add-youtu-llm

8114bf1

vasqu merged commit be87564 into huggingface:main Jan 28, 2026
21 of 24 checks passed

LuJunru deleted the add-youtu-llm branch January 28, 2026 11:54

This was referenced Jan 29, 2026

Update test of Youtu-LLM to pr-aligned repos #43578

Merged

Update youtu-llm's readme aligned with transformers merging TencentCloudADP/youtu-tip#25

Merged

xenova added a commit to huggingface/transformers.js that referenced this pull request Jan 30, 2026

Add support for Youtu-LLM

34f7777

huggingface/transformers#43166

ydshieh reviewed Apr 5, 2026

View reviewed changes

ydshieh mentioned this pull request Apr 5, 2026

Fix unexpected TF32 being enabled in testing #45252

Merged

AaronRohrbacher pushed a commit to AaronRohrbacher/transformers.js that referenced this pull request Apr 16, 2026

Add support for Youtu-LLM

2f0ffdd

huggingface/transformers#43166



		class YoutuModel(LlamaModel):
		_keys_to_ignore_on_load_unexpected = [""]

		logger = logging.get_logger(__name__)


		class YoutuConfig(DeepseekV3Config):

		def convert_rope_params_to_dict(self, ignore_keys_at_rope_validation: set \| None = None, **kwargs):
		raise AttributeError("Not overwritten for the Youtu model!")

	def setup(self):
	cleanup(torch_device, gc_collect=True)

	def tearDown(self):
	# TODO (joao): automatic compilation, i.e. compilation when `cache_implementation="static"` is used, leaves
	# some memory allocated in the cache, which means some object is not being released properly. This causes some
	# unoptimal memory usage, e.g. after certain tests a 7B model in FP16 no longer fits in a 24GB GPU.
	# Investigate the root cause.
	cleanup(torch_device, gc_collect=True)

Conversation

LuJunru commented Jan 8, 2026

What does this PR do?

Before submitting

Who can review?

Uh oh!

LuJunru commented Jan 8, 2026

Uh oh!

LuJunru commented Jan 9, 2026

Uh oh!

molbap commented Jan 12, 2026

Uh oh!

github-actions Bot commented Jan 12, 2026

Uh oh!

github-actions Bot commented Jan 12, 2026

CI Results

Uh oh!

molbap left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuJunru commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LuJunru commented Jan 27, 2026

Uh oh!

vasqu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vasqu commented Jan 27, 2026

Uh oh!

vasqu commented Jan 27, 2026

Uh oh!

github-actions Bot commented Jan 27, 2026

Uh oh!

github-actions Bot commented Jan 27, 2026

CI Results

Model CI Report

❌ Failed tests

Uh oh!

LuJunru commented Jan 27, 2026

Uh oh!

vasqu commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

LuJunru commented Jan 13, 2026 •

edited

Loading

vasqu left a comment •

edited

Loading

vasqu commented Jan 27, 2026 •

edited

Loading