[Community contributions] Model cards

Hey friends! 👋 

We are currently in the process of improving the Transformers model cards by making them more directly useful for everyone. The main goal is to:

1. Standardize all model cards with a consistent format so users know what to expect when moving between different model cards or trying to learn how to use a new model.
2. Include a brief description of the model (what makes it unique/different) written in a way that's accessible to everyone.
3. Provide ready to use code examples featuring the `Pipeline`, `AutoModel`, and `transformers-cli` with available optimizations included. For large models, provide a quantization example so its easier for everyone to run the model.
4. Include an attention mask visualizer for currently supported models to help users visualize what a model is seeing (refer to #36630) for more details.

Compare the before and after model cards below:

![Image](https://github.com/user-attachments/assets/590de86f-cfd2-4dd0-9167-83b7d19d858a)

With so many models in Transformers, we could really use some a hand with standardizing the existing model cards. If you're interested in making a contribution, pick a model from the list below and then you can get started!

## Steps

Each model card should follow the format below. You can copy the text exactly as it is!

```md
# add appropriate badges
<div style="float: right;">
    <div class="flex flex-wrap space-x-1">
           <img alt="" src="" >
    </div>
</div>

# Model name

[Model name](https://huggingface.co/papers/...) ...

A brief description of the model and what makes it unique/different. Try to write this like you're talking to a friend. 

You can find all the original [Model name] checkpoints under the [Model name](link) collection.

> [!TIP]
> This model was contributed by [author](link to Hub profile).
>
> Click on the [Model name] models in the right sidebar for more examples of how to apply [Model name] to different [insert task types here] tasks.

The example below demonstrates how to [insert task here] with [`Pipeline`] or the [`AutoModel`] class.

<hfoptions id="usage">
<hfoption id="Pipeline>

insert pipeline code here

</hfoption>
<hfoption id="AutoModel">

add AutoModel code here

</hfoption>
<hfoption id="transformers-cli">

add transformers-cli usage here if applicable/supported, otherwise close the hfoption block

</hfoption>
</hfoptions>

Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends.

The example below uses [insert quantization method here](link to quantization method) to only quantize the weights to __.

# add if this is supported for your model
Use the [AttentionMaskVisualizer](https://github.com/huggingface/transformers/blob/beb9b5b02246b9b7ee81ddf938f93f44cfeaad19/src/transformers/utils/attention_visualizer.py#L139) to better understand what tokens the model can and cannot attend to.

\```py
from transformers.utils.attention_visualizer import AttentionMaskVisualizer

visualizer = AttentionMaskVisualizer("google/gemma-3-4b-it")
visualizer("<img>What is shown in this image?")
\```

# upload image to https://huggingface.co/datasets/huggingface/documentation-images/tree/main/transformers/model_doc and ping me to merge
<div class="flex justify-center">
    <img src=""/>
</div>

## Notes

- Any other model-specific notes should go here.

   \```py
    <insert relevant code snippet here related to the note if its available>
   \ ```

## Resources
add links to external resources only
```

For examples, take a look at #36469 or the [BERT](https://huggingface.co/docs/transformers/main/en/model_doc/bert), [Llama](https://huggingface.co/docs/transformers/main/en/model_doc/llama), [Llama 2](https://huggingface.co/docs/transformers/main/en/model_doc/llama2), [Gemma](https://huggingface.co/docs/transformers/main/en/model_doc/gemma3) 3, [PaliGemma](https://huggingface.co/docs/transformers/main/en/model_doc/paligemma), [ViT](https://huggingface.co/docs/transformers/main/en/model_doc/vit), and [Whisper](https://huggingface.co/docs/transformers/main/en/model_doc/whisper) model cards on the `main` version of the docs. 

Once you're done or if you have any questions, feel free to ping @stevhliu to review. Don't add `fix` to your PR to avoid closing this issue.

I'll also be right there working alongside you and opening PRs to convert the model cards so we can complete this faster together! 🤗 

## Models

- [x] albert - #37753 
- [x] align - #38072 
- [x] altclip - #38306 
- [x] aria - #38472 
- [ ] audio_spectrogram_transformer - assigned to @KishanPipariya 
- [ ] autoformer.- #37231 
- [x] aya_vision - #38749 
- [x] bamba - #38853 
- [ ] bark - assigned to @mdhvgoyal 
- [x] bart - #37858 
- [x] barthez - #39701 
- [x] bartpho -  #40051 
- [ ] beit - assigned to @parthchopra07 
- [x] bert
- [x] bert_generation - #40250 
- [ ] bert_japanese - #39466 
- [x] bertweet - #37981 
- [x] big_bird - #37959 
- [x] bigbird_pegasus - #39104 
- [x] biogpt - #38214 
- [ ] bit
- [ ] blenderbot - assigned to @Diksha-Maurya 
- [ ] blenderbot_small
- [x] blip. - #38513 
- [ ] blip_2 - assigned to @olccihyeon 
- [ ] bloom - assigned to @AreebAhmad-02 
- [ ] bridgetower
- [ ] bros
- [x] byt5 - #38699 
- [x] camembert - #39227 
- [x] canine - #38631 
- [ ] chameleon - assigned to @ankitgpt18 
- [ ] chinese_clip
- [x] clap - #39738 
- [x] clip - #37040 
- [ ] clipseg - assigned to @dataWizard7957 
- [ ] clvp
- [x] code_llama - #37115 
- [ ] codegen - #40471 
- [x] cohere - #37056 
- [x] cohere2 - #39604 
- [x] colpali - #37309 
- [ ] conditional_detr
- [ ] convbert - #38470 
- [ ] convnext - assigned to @Rklearns 
- [ ] convnextv2 - #40589 
- [ ] cpm
- [ ] cpmant
- [ ] ctrl - assigned to @Ishubhammohole 
- [x] cvt - #38731 
- [ ] dab_detr
- [ ] dac
- [ ] data2vec - assigned to @boy397 
- [ ] dbrx
- [x] deberta - #37409 
- [x] deberta_v2 - #38895 
- [ ] decision_transformer
- [x] deformable_detr - #39902 
- [ ] deit
- [x] depth_anything - #37065 
- [ ] depth_pro
- [x] detr - #39822 
- [ ] dialogpt - assigned to @Uvi-12 
- [ ] diffllama
- [ ] dinat
- [x] dinov2 - #37104 
- [ ] dinov2_with_registers
- [x] distilbert - #37157 
- [x] dit - #38721 
- [x] donut - #37290 
- [ ] dpr
- [ ] dpt
- [x] efficientloftr - #39620 
- [ ] efficientnet - assigned to @Sudhesh-Rajan27 
- [x] electra - #37063 
- [ ] emu3
- [ ] encodec
- [x] encoder_decoder - #39272 
- [x] ernie - #39657 
- [ ] esm
- [x] falcon - #37184 
- [x] falcon_mamba - #37253 
- [ ] fastspeech2_conformer - #37377 
- [ ] flaubert
- [ ] flava
- [ ] fnet
- [ ] focalnet
- [ ] fsmt
- [ ] funnel
- [ ] fuyu
- [x] gemma - #37674 
- [x] gemma2 - #37076 
- [x] gemma3
- [ ] git - assigned to @Big-Marvel 
- [ ] glm
- [ ] glpn
- [ ] got_ocr2
- [x] gpt2 - #37101 
- [ ] gpt_bigcode - #40615 
- [x] gpt_neo - #38505 
- [ ] gpt_neox - #38550 
- [x] gpt_neox_japanese - #39862 
- [ ] gpt_sw3
- [ ] gptj - #40404 
- [x] granite - #37791 
- [ ] granitemoe - assigned to @cassiasamp 
- [ ] granitemoeshared
- [x] grounding_dino - #37925 
- [ ] groupvit - assigned to @shreya888 
- [ ] helium
- [ ] herbert
- [x] hgnetv2 - #39965 
- [ ] hiera
- [x] hubert - #39742 
- [ ] ibert
- [ ] idefics - assigned to @rraghavkaushik 
- [ ] idefics2
- [ ] idefics3
- [x] ijepa - #39354 
- [ ] imagegpt
- [ ] informer
- [ ] instructblip
- [ ] instructblipvideo
- [x] jamba - #37152 
- [ ] jetmoe - #40749 
- [ ] kosmos2
- [x] layoutlm - #40129 
- [ ] layoutlmv2
- [ ] layoutlmv3 - #37155 
- [ ] layoutxlm
- [x] led - #39233 
- [ ] levit
- [x] lightglue - #39407 
- [ ] lilt
- [x] llama
- [x] llama2
- [ ] llama3 - assigned to @capnmav77 
- [ ] llava - assigned to @itsmejul 
- [x] llava_next - #38894 
- [ ] llava_next_video
- [ ] llava_onevision
- [x] longformer - #37622 
- [ ] longt5
- [ ] luke
- [ ] lxmert
- [ ] m2m_100
- [x] mamba - #37863 
- [x] mamba2 - #37951 
- [x] marian - #39138 
- [ ] markuplm
- [ ] mask2former
- [ ] maskformer
- [x] mbart - #37619 
- [x] mbart50 - #37619
- [ ] megatron_bert - #40568 
- [ ] megatron_gpt2
- [ ] mgp_str
- [x] mimi - #39824 
- [x] mistral - #37156 
- [x] mistral3 - #39531 
- [ ] mixtral - assigned to @darmasrmez 
- [ ] mllama - #37647 
- [ ] mluke
- [x] mobilebert - #37256 
- [x] mobilenet_v1 - #37948 
- [x] mobilenet_v2 - #37948 
- [x] mobilevit - #40033 
- [ ] mobilevitv2
- [x] modernbert - #37052 
- [x] modernbertdecoder - #39453 
- [x] moonshine - #38711 
- [ ] moshi
- [ ] mpnet - assigned to @SanjayDevarajan03 
- [ ] mpt
- [ ] mra
- [x] mt5 - #39702 
- [ ] musicgen
- [ ] musicgen_melody - #38955 
- [ ] mvp
- [ ] myt5
- [ ] nemotron
- [x] nllb - #40074 
- [ ] nllb_moe
- [ ] nougat
- [ ] nystromformer
- [x] olmo - #40233 
- [x] olmo2 - #38394 
- [x] olmoe - #39344 
- [ ] omdet_turbo
- [ ] oneformer
- [x] openai - #37255 
- [x] opt - #39568 
- [ ] owlv2
- [ ] owlvit
- [x] paligemma
- [ ] patchtsmixer
- [ ] patchtst
- [x] pegasus - #38675 
- [x] pegasus_x - #38971 
- [ ] perceiver
- [ ] persimmon
- [x] phi - #37583 
- [ ] phi3 - assigned to @arpitsinghgautam 
- [x] phi4_multimodal - #38830 
- [ ] phimoe
- [ ] phobert
- [ ] pix2struct
- [ ] pixtral - #40442 
- [ ] plbart
- [ ] poolformer
- [ ] pop2piano
- [ ] prompt_depth_anything
- [ ] prophetnet - assigned to @SahanaMark 
- [ ] pvt
- [ ] pvt_v2
- [x] qwen2 - #37192 
- [x] qwen2_5_vl - #37099 
- [ ] qwen2_audio
- [x] qwen2_moe - #38649 
- [ ] qwen2_vl - assigned to @SaiSanthosh1508 
- [x] rag - #40222 
- [ ] recurrent_gemma
- [ ] reformer
- [ ] regnet
- [ ] rembert
- [ ] resnet - assigned to @BettyChen0616 
- [x] roberta - #38777 
- [ ] roberta_prelayernorm - assigned to @Yuvraj-Dhepe 
- [x] roc_bert - #38835 
- [x] roformer - #37946 
- [ ] rt_detr
- [ ] rt_detr_v2
- [ ] rwkv
- [ ] sam - #40578 
- [ ] seamless_m4t
- [ ] seamless_m4t_v2
- [x] segformer - #40417 
- [ ] seggpt
- [ ] sew
- [ ] sew_d
- [ ] shieldgemma2 - assigned to @BryanBradfo 
- [x] siglip - #37585 
- [x] siglip2 - #37624 
- [ ] smolvlm - assigned to @udapy 
- [ ] speech_encoder_decoder
- [ ] speech_to_text
- [ ] speecht5 - assigned to @HemanthSai7 
- [ ] splinter
- [ ] squeezebert
- [ ] stablelm
- [ ] starcoder2 - #40737 
- [x] superglue - #39406 
- [x] superpoint - #38896 
- [ ] swiftformer
- [ ] swin - assigned to @BryanBradfo 
- [ ] swin2sr
- [x] swinv2 - #37942 
- [x] switch_transformers - #39305 
- [x] t5 - #37261 
- [ ] table_transformer
- [ ] tapas
- [ ] textnet
- [ ] time_series_transformer
- [ ] timesformer - assigned to @mreraser 
- [ ] timm_backbone
- [ ] timm_wrapper
- [x] trocr - #40240 
- [ ] tvp
- [ ] udop
- [ ] umt5
- [ ] unispeech
- [ ] unispeech_sat
- [ ] univnet
- [ ] upernet
- [ ] video_llava
- [ ] videomae - #40573 
- [ ] vilt
- [ ] vipllava
- [ ] vision_encoder_decoder - assigned to @Bhavay-2001 
- [ ] vision_text_dual_encoder
- [x] visual_bert - #40057 
- [x] vit
- [x] vit_mae - #38302 
- [ ] vit_msn - assigned to @ChirayuXD 
- [ ] vitdet
- [ ] vitmatte
- [x] vitpose - #38630 
- [ ] vitpose_backbone
- [x] vits - #37335 
- [ ] vivit - assigned to @mreraser 
- [ ] wav2vec2 - #38956 
- [ ] wav2vec2_bert - #38957 
- [ ] wav2vec2_conformer - #38958 
- [ ] wav2vec2_phoneme - #38959 
- [ ] wav2vec2_with_lm - assigned to @AshAnand34 
- [ ] wavlm - #40047 
- [x] whisper
- [ ] x_clip
- [ ] xglm
- [x] xlm - #38595 
- [x] xlm_roberta - #38596 
- [x] xlm_roberta_xl - #38597 
- [ ] xlnet - assigned to @vellankis-space 
- [ ] xmod
- [x] yolos - #39528 
- [ ] yoso
- [ ] zamba - assigned to @devkade
- [ ] zamba2 - assigned to @devkade
- [x] zoedepth - #37898 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Community contributions] Model cards #36979

Steps

Models

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Community contributions] Model cards #36979

Description

Steps

Models

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions