Skip to content

Modular playground#43743

Open
itazap wants to merge 32 commits intomainfrom
modular_playground
Open

Modular playground#43743
itazap wants to merge 32 commits intomainfrom
modular_playground

Conversation

@itazap
Copy link
Copy Markdown
Collaborator

@itazap itazap commented Feb 4, 2026

Update:

  • improve sanitization of code pre-embedding 

  • strip dtypes, args, params, etc.
  • filter self-contained model matches

  • improve summary (see below)

  • create prompt .md to create a modular file based on detector's results, that can then be used for utils/modular_model_converter.py

Modular Inheritance - Eval Dataset

Naiive but a dataset mapping our models to the model class(es) they inherit form in their modular file

https://huggingface.co/datasets/itazap/modular-model-eval/viewer

Usage:


python utils/modular_model_detector.py --modeling-file src/transformers/models/sarvam/modeling_sarvam.py 
------------




Model class match summary

Total classes: 11

Models with most matched classes:
Model            | Matched | Pct   | Mean score | Classes                                                                                                           
-----------------+---------+-------+------------+-------------------------------------------------------------------------------------------------------------------
deepseek_v2      | 6/11    | 54.5% | 0.8894     | MoEGate, SarvamMLAAttention, SarvamMLADecoderLayer, SarvamMLAMLP, SarvamMLAMoE, SarvamMLAModel                    
ernie4_5_vl_moe  | 6/11    | 54.5% | 0.7658     | MoEGate, SarvamMLADecoderLayer, SarvamMLAMLP, SarvamMLAMoE, SarvamMLARotaryEmbedding, SarvamMLAYarnRotaryEmbedding
qwen3_omni_moe   | 5/11    | 45.5% | 0.7086     | SarvamMLADecoderLayer, SarvamMLAMLP, SarvamMLARMSNorm, SarvamMLARotaryEmbedding, SarvamMLAYarnRotaryEmbedding     




run_modular_detector_eval.py output:

=== Eval summary (models that have bases and a non-empty detector summary) ===
Total with labels and summary: 29
Top-1 accuracy (first suggested model in bases): 20.69% (6/29)
Top-3 accuracy (any base in top 3): 51.72% (15/29)
Top-5 accuracy (any base in top 5): 65.52% (19/29)
Total eval entries with bases: 29 (skipped/errors: 0)

=== Per-model predictions ===
model                bases                                                predicted top 3                                
-------------------  ---------------------------------------------------  -------------------------------------------
biogpt               bart,opt                                             trocr, whisper, patchtst
camembert            roberta                                              xmod, xlm_roberta_xl, xlm_roberta
conditional_detr     deformable_detr,detr                                 table_transformer, detr, dab_detr
deepseek_v2          llama,qwen2_moe                                      llama, nemotron, deepseek_v3
deepseek_v3          llama,mixtral,qwen2_moe                              nemotron, llama, glm4_moe
deformable_detr      detr                                                 grounding_dino, detr, conditional_detr
falcon_mamba         mamba                                                mamba
gpt_neox             llama                                                gptj, bigbird_pegasus, openai
granite              llama                                                diffllama, nemotron, moshi
granitemoe           granite,jetmoe,llama,mixtral                         granitemoehybrid, mixtral, granitemoeshared
hubert               wav2vec2                                             mbart, plbart, mt5
hunyuan_v1_moe       hunyuan_v1_dense,llama,mixtral                       hunyuan_v1_dense, qwen2_vl, llama
jetmoe               llama,mixtral                                        moshi, qwen2_vl, nemotron
mistral              llama                                                phi3, clvp, llama
olmo                 llama                                                nemotron, cohere, diffllama
olmoe                gemma,llama,mixtral,qwen2_moe                        flex_olmo, llama, mixtral
paddleocr_vl         ernie4_5,qwen2_5_omni,qwen2_vl,siglip,video_llama_3  llama, arcee, gemma
persimmon            llama                                                stablelm, nemotron, llama
phi                  clip,llama                                           auto, bart, esm
phi3                 mistral,phi                                          llama, moshi, nemotron
phimoe               llama,mixtral                                        nemotron, mixtral, flex_olmo
qwen2                gemma2,llama,mistral                                 nemotron, qwen2_vl, llama
qwen2_moe            gemma,gemma2,llama,mixtral                           qwen3_moe, nemotron, qwen2_vl
sew                  wav2vec2                                             hubert
switch_transformers  t5                                                   longt5, udop, umt5
unispeech            wav2vec2                                             wav2vec2, unispeech_sat, longformer
unispeech_sat        wav2vec2                                             wav2vec2, unispeech, longformer
wavlm                wav2vec2                                             wav2vec2, longformer, xlnet
xlm_roberta          roberta                                              camembert, xlm_roberta_xl, xmod

UPDATE:

=== Eval summary (models that have bases and a non-empty detector summary) ===
Total with labels and summary: 34
Top-1 accuracy (first suggested model in bases): 76.47% (26/34)
Top-3 accuracy (any base in top 3): 97.06% (33/34)
Top-5 accuracy (any base in top 5): 100.00% (34/34)
Total eval entries with bases: 34 (skipped/errors: 0)

=== Per-model predictions ===
model                bases                                                predicted                               
-------------------  ---------------------------------------------------  ----------------------------------------
biogpt               bart,opt                                             bart, opt, trocr
camembert            roberta                                              roberta, bert, xmod
conditional_detr     deformable_detr,detr                                 detr, table_transformer, deformable_detr
deepseek_v2          llama,qwen2_moe                                      llama, gemma, mistral
deepseek_v3          llama,mixtral,qwen2_moe                              llama, gemma, mistral
deformable_detr      detr                                                 grounding_dino, detr, conditional_detr
ernie4_5             glm,llama,olmo                                       llama, mistral, glm
ernie4_5_moe         ernie4_5,llama,mixtral,qwen3_moe                     qwen2, qwen3_moe, gemma
falcon_mamba         mamba                                                mamba, mamba2, falcon_h1
gpt_neox             llama                                                llama, gpt_neox_japanese, gptj
granite              llama                                                llama, mistral, diffllama
granitemoe           granite,jetmoe,llama,mixtral                         jetmoe, granitemoeshared, granite
hubert               wav2vec2                                             unispeech_sat, unispeech, wav2vec2
hunyuan_v1_moe       hunyuan_v1_dense,llama,mixtral                       hunyuan_v1_dense, mistral, mixtral
jetmoe               llama,mixtral                                        mixtral, llama, granitemoe
mistral              llama                                                llama, phi3, gemma
olmo                 llama                                                llama, mistral, nemotron
olmoe                gemma,llama,mixtral,qwen2_moe                        qwen2_moe, mistral, flex_olmo
paddleocr_vl         ernie4_5,qwen2_5_omni,qwen2_vl,siglip,video_llama_3  qwen2_vl, ernie4_5, llama
persimmon            llama                                                llama, stablelm, nemotron
phi                  clip,llama                                           llama, stablelm, persimmon
phi3                 mistral,phi                                          llama, mistral, phi
phimoe               llama,mixtral                                        mixtral, mistral, llama
qwen2                gemma2,llama,mistral                                 llama, mistral, gemma
qwen2_moe            gemma,gemma2,llama,mixtral                           mixtral, mistral, llama
qwen3_5              qwen3_next,qwen3_vl                                  qwen3_vl, qwen2_vl, qwen2_5_vl
qwen3_5_moe          qwen3_5,qwen3_next,qwen3_vl_moe                      qwen3_next, jamba, qwen2
qwen3_omni_moe       qwen3,qwen3_moe,qwen3_vl_moe                         qwen2_5_omni, qwen3_vl_moe, qwen2_vl
sew                  wav2vec2                                             hubert, wav2vec2, unispeech_sat
switch_transformers  t5                                                   longt5, udop, t5
unispeech            wav2vec2                                             wav2vec2, unispeech_sat, wavlm
unispeech_sat        wav2vec2                                             wav2vec2, unispeech, wavlm
wavlm                wav2vec2                                             wav2vec2, unispeech, unispeech_sat
xlm_roberta          roberta                                              camembert, xlm_roberta_xl, xmod
(uvenv) ita_zaporozhets@ip-26-0-162-14:/fsx/ita_zaporozhets/transformers$ 

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@itazap
Copy link
Copy Markdown
Collaborator Author

itazap commented Feb 11, 2026

run-slow: persimmon

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: persimmon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants