Add Youtu-LLM model#43166
Conversation
May I ask if it is possible to concentrate the test only on Youtu-LLM (the new model)? The summary here seems report errors raised by other models. junru |
…ition_embedding in DiT (#43068) * qwen2_5_omni: make max_mel_frames an inference-time knob * not fail with raising ValueError, instead make it continue to run by choosing a target_duration that's capped and aligned * added unit tests for Token2WavShape shape mismatch Signed-off-by: Dong Wang <dongw2019@gmail.com> * make fixup * remove unit test which takes too much GPU memory Signed-off-by: Dong Wang <dongw2019@gmail.com> * reduce gpu memory usage from the unit test * addressed comments Signed-off-by: Dong Wang <dongw2019@gmail.com> --------- Signed-off-by: Dong Wang <dongw2019@gmail.com>
|
run-slow: youtu_llm |
|
This comment contains models: ["models/youtu_llm"] |
CI Results✅ No failing test specific to this PR 🎉 ! |
molbap
left a comment
There was a problem hiding this comment.
Seems clean, good modular file with simply Llama + MLA, beautiful. Asked a few questions, let me know and I'll re-review!
There was a problem hiding this comment.
is the official name YoutuLLM or Youtu as in the prefixes here?
There was a problem hiding this comment.
We chose to use Youtu as the prefix of modules, as it is more suitable for extension (e.g., we plan to introduce YoutuVL in near future). Youtu-LLM is rather a brand name.
There was a problem hiding this comment.
then everything that has to see with model name (youtu) should be named as such, like the model directory
|
|
||
| model_sdpa = YoutuForCausalLM.from_pretrained( | ||
| "tencent/Youtu-LLM-2B-Base", | ||
| dtype=torch.float16, |
There was a problem hiding this comment.
let's make sdpa explicit here
|
|
||
|
|
||
| class YoutuModel(LlamaModel): | ||
| _keys_to_ignore_on_load_unexpected = [""] |
There was a problem hiding this comment.
is this to remove the Llama attribute? if so, ok
There was a problem hiding this comment.
For the current version of the model (Youtu-LLM-2B family), this line of code could be removed.
| @require_torch_accelerator | ||
| @pytest.mark.torch_compile_test | ||
| @require_read_token | ||
| def test_compile_static_cache(self): |
There was a problem hiding this comment.
Thanks for adding an integration test! however naming-wise, seems to measure dynamic and static Cache no? By the way, could we have a simple no-compile integration test that works in the simplest setting, just to avoid regressions?
There was a problem hiding this comment.
We have provided inference tests below based on no-compile dynamic cache and no-compile static cache. Basically, I implemented this test function by referencing test function of DeepSeek V3.
There was a problem hiding this comment.
sure, can we update the name though to make it more clear and separate in two tests? that way if it breaks at some point it's easier to debug
There was a problem hiding this comment.
Sure, is there any official examples that I can follow up?
| @parameterized.expand([("random",), ("same",)]) | ||
| @unittest.skip("Youtu-LLM is not compatible with assisted decoding") | ||
| def test_assisted_decoding_matches_greedy_search(self, assistant_type): | ||
| pass | ||
|
|
||
| @unittest.skip("Youtu-LLM is not compatible with assisted decoding") | ||
| def test_prompt_lookup_decoding_matches_greedy_search(self, assistant_type): | ||
| pass | ||
|
|
||
| @unittest.skip("Youtu-LLM is not compatible with assisted decoding") | ||
| def test_assisted_decoding_sample(self): | ||
| pass | ||
|
|
||
| @unittest.skip("Youtu-LLM uses MLA so it is not compatible with the standard cache format") | ||
| def test_beam_search_generate_dict_outputs_use_cache(self): | ||
| pass | ||
|
|
||
| @unittest.skip("Youtu-LLM uses MLA so it is not compatible with the standard cache format") | ||
| def test_greedy_generate_dict_outputs_use_cache(self): | ||
| pass | ||
|
|
||
| @unittest.skip(reason="SDPA can't dispatch on flash due to unsupported head dims") | ||
| def test_sdpa_can_dispatch_on_flash(self): | ||
| pass | ||
|
|
||
| @unittest.skip(reason="Youtu-LLM is not suitable for testing with extreme small vocabulary") | ||
| def test_resize_tokens_embeddings(self): | ||
| pass |
There was a problem hiding this comment.
are all these tests indeed not working?
There was a problem hiding this comment.
Let's check if we can fix the majority by moving the tests under the CausalLM wrapper classes
|
Hi @molbap I've updated a new version of code according to the discussion aforementioned. Can you help start a new solo test of Youtu-LLM (run-slow: youtu_llm)? |
|
Hi @vasqu I have fixed most of the issues mentioned above, please check again. There is one specific issue related to to I noticed many |
There was a problem hiding this comment.
I fixed a few things in regards to the config and tests, that resolves your issue as well it seems. Lmk if not!
Just a few last nits but approving since it's nothing major, quick checking with our slow CI in a second (might need to adjust values because of GPU differences)
| logger = logging.get_logger(__name__) | ||
|
|
||
|
|
||
| class YoutuConfig(DeepseekV3Config): |
There was a problem hiding this comment.
Sorry, that was confusing on my side --> I meant to say to add this to modular then. You can see that it will unfold the inherited attributes in the config file (which also solves the consistency issues) but better double check I haven't missed something
| def convert_rope_params_to_dict(self, ignore_keys_at_rope_validation: set | None = None, **kwargs): | ||
| raise AttributeError("Not overwritten for the Youtu model!") |
There was a problem hiding this comment.
Fyi, this way you can disable inheriting function from others
| class YoutuModelTester(CausalLMModelTester): | ||
| if is_torch_available(): | ||
| base_model_class = YoutuModel | ||
|
|
||
| def __init__( | ||
| self, | ||
| parent, | ||
| kv_lora_rank=16, | ||
| q_lora_rank=32, | ||
| qk_rope_head_dim=32, | ||
| qk_nope_head_dim=32, | ||
| v_head_dim=32, | ||
| ): | ||
| super().__init__(parent=parent) | ||
| self.kv_lora_rank = kv_lora_rank | ||
| self.q_lora_rank = q_lora_rank | ||
| self.qk_nope_head_dim = qk_nope_head_dim | ||
| self.qk_rope_head_dim = qk_rope_head_dim | ||
| self.v_head_dim = v_head_dim | ||
|
|
||
|
|
||
| @require_torch | ||
| class YoutuModelTest(CausalLMModelTest, unittest.TestCase): | ||
| model_tester_class = YoutuModelTester | ||
|
|
||
| def _check_past_key_values_for_generate(self, batch_size, past_key_values, seq_length, config): | ||
| """Needs to be overridden as youtu-llm has special MLA cache format (though we don't really use the MLA)""" | ||
| self.assertIsInstance(past_key_values, Cache) | ||
|
|
||
| # (batch, head, seq_length, head_features) | ||
| expected_common_shape = ( | ||
| batch_size, | ||
| getattr(config, "num_key_value_heads", config.num_attention_heads), | ||
| seq_length, | ||
| ) | ||
| expected_key_shape = expected_common_shape + (config.qk_nope_head_dim + config.qk_rope_head_dim,) | ||
| expected_value_shape = expected_common_shape + (config.v_head_dim,) | ||
|
|
||
| for layer in past_key_values.layers: | ||
| self.assertEqual(layer.keys.shape, expected_key_shape) | ||
| self.assertEqual(layer.values.shape, expected_value_shape) |
There was a problem hiding this comment.
Refactored this to use our causal lm class - it makes it easier for use to refactor tests in the future
| def tearDown(self): | ||
| cleanup(torch_device, gc_collect=False) |
There was a problem hiding this comment.
Let's also clean on setup e.g.
transformers/tests/models/llama/test_modeling_llama.py
Lines 65 to 73 in a1f63d5
(no need to copy the comment)
|
It does seem we need to fix at least: |
|
run-slow: youtu |
|
This comment contains models: ["models/youtu"] |
CI ResultsModel CI Report❌ Failed tests
|
alright, i guess this was the error that led me to add conditions of |
|
|
|
run-slow: youtu |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, youtu |
|
This comment contains models: ["models/youtu"] |
CI Results✅ No failing test specific to this PR 🎉 ! |
|
@LuJunru I just updated some last few nits, everything passes (the other tests are unrelated) so I merged Thanks for iterating and gz on the model addition 🤗 |
| if is_torch_available(): | ||
| import torch | ||
|
|
||
| torch.set_float32_matmul_precision("high") |
There was a problem hiding this comment.
Hi @LuJunru Is any particular reason to set it to "high" here? It causes the issue described in
There was a problem hiding this comment.
To avoid test failures caused by minor numerical fluctuations across different testing environments when using different GPUs.
There was a problem hiding this comment.
Thanks.
I will change it to highest which should work too. If you are curious why we need to change it, see #45252


What does this PR do?
This PR adds the implementation for the released Youtu-LLM model. The model has the following features:
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@ArthurZucker @Cyrilvallez