[RoPE] run RoPE tests when the model uses RoPE#40630
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
zucchini-nlp
left a comment
There was a problem hiding this comment.
Thanks! Also noticed this when working on RoPE refactoring. I guess we can enable these tests without checking for rotary_embedding_layer and instead have a heuristic to check if model has a layer called self.rotary_embedding
That way we'll be sure the test is run in all models. and if model is special it will skip the test. WDYT?
|
@Cyrilvallez @zucchini-nlp great PR comments, we can (and should!) automate test runs. The latest commit removes the manual |
zucchini-nlp
left a comment
There was a problem hiding this comment.
Thanks for making the test suite better
| # Retrieves the RoPE layer class from the base model class. Assumption: the RoPE layer is under a few | ||
| # possible attribute names and is found in the base model class. In some (inconsistent) cases, it may be | ||
| # found in the self_attention layer instead. | ||
| base_model = self.model_tester.base_model_class(config) |
There was a problem hiding this comment.
we might need to model.get_decoder as well for models where the lm backbone is hidden inside the base model. Though ig these tests aren't yet used in multimodal models
There was a problem hiding this comment.
Good point.
Given that the tests are only run on decoder-only models for now, I'd rather leave as is (and upgrade when it's needed) 🤗
| for rope_attr in possible_rope_attributes: | ||
| rope_class = getattr(base_model, rope_attr, None) # expected pattern | ||
| if ( | ||
| rope_class is None | ||
| and hasattr(base_model, "layers") | ||
| and hasattr(base_model.layers[0], "self_attention") | ||
| ): | ||
| rope_class = getattr(base_model.layers[0].self_attention, rope_attr, None) # fallback | ||
| if rope_class is not None: | ||
| rope_class = type(rope_class) | ||
| break |
There was a problem hiding this comment.
I think we can make it a bit more general/catch more modules if we do something like
for name, module in model.named_modules():
if any(potential_name in name for potential_name in possible_rope_attributes):
rope_class = type(module)
break-- it would avoid edge cases when layers are not named layers, or attention is not self_attention
There was a problem hiding this comment.
Added 👍
(and confirmed that it doesn't have a significative negative impact on test runtime)
|
[For maintainers] Suggested jobs to run (before merge) run-slow: arcee, dbrx, deepseek_v2, ernie4_5, ernie4_5_moe, hunyuan_v1_dense, hunyuan_v1_moe, llama, minimax, mistral, nemotron, phi, phi3, phimoe, recurrent_gemma, stablelm |
* enable rope tests * no manual rope test parameterization * Apply suggestions from code review * Update tests/models/hunyuan_v1_dense/test_modeling_hunyuan_v1_dense.py * PR comment: use generalist torch code to find the rope layer
What does this PR do?
The
CausalLMModelTestmixin has RoPE tests, but it requires settingrotary_embedding_layerin each model tester. In most models, it was unset, so they were not running RoPE-related tests -- exposing ourselves to easily preventable issues like #40461 😱This PR:
rotary_embedding_layerfrom theCausalLMModelTestmixin.CausalLMModelTest, and uses it to enable RoPE tests on each model. No more manual errors 🤗