From 16786e557d815e5542e0d495bccab53e386eafbb Mon Sep 17 00:00:00 2001
From: AshAnand34 <aashishanand2019@gmail.com>
Date: Wed, 4 Jun 2025 14:45:49 -0700
Subject: [PATCH 1/3] Created model card for XLM model

---
 docs/source/en/model_doc/xlm.md | 103 +++++++++++++++++++++++---------
 1 file changed, 75 insertions(+), 28 deletions(-)
diff --git a/docs/source/en/model_doc/xlm.md b/docs/source/en/model_doc/xlm.md
index 61effea7cca7..f78b072ee186 100644
--- a/docs/source/en/model_doc/xlm.md
+++ b/docs/source/en/model_doc/xlm.md
@@ -16,44 +16,91 @@ rendered properly in your Markdown viewer.
 
 # XLM
 
-<div class="flex flex-wrap space-x-1">
-<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
-<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
+<div style="float: right;">
+    <div class="flex flex-wrap space-x-1">
+        <img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
+        <img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
+    </div>
 </div>
 
-## Overview
+[XLM](https://arxiv.org/abs/1901.07291) is a transformer model pretrained using one of three objectives: causal language modeling (CLM), masked language modeling (MLM), or translation language modeling (TLM). What makes XLM unique is its ability to handle multiple languages through cross-lingual pretraining, achieving state-of-the-art results on cross-lingual classification and machine translation tasks.
 
-The XLM model was proposed in [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by
-Guillaume Lample, Alexis Conneau. It's a transformer pretrained using one of the following objectives:
+You can find all the original XLM checkpoints under the [XLM](https://huggingface.co/models?search=xlm) collection.
 
-- a causal language modeling (CLM) objective (next token prediction),
-- a masked language modeling (MLM) objective (BERT-like), or
-- a Translation Language Modeling (TLM) object (extension of BERT's MLM to multiple language inputs)
+> [!TIP]
+> Click on the XLM models in the right sidebar for more examples of how to apply XLM to different cross-lingual tasks like classification, translation, and question answering.
 
-The abstract from the paper is the following:
+The example below demonstrates how to use XLM for masked language modeling with [`Pipeline`] or the [`AutoModel`] class.
 
-*Recent studies have demonstrated the efficiency of generative pretraining for English natural language understanding.
-In this work, we extend this approach to multiple languages and show the effectiveness of cross-lingual pretraining. We
-propose two methods to learn cross-lingual language models (XLMs): one unsupervised that only relies on monolingual
-data, and one supervised that leverages parallel data with a new cross-lingual language model objective. We obtain
-state-of-the-art results on cross-lingual classification, unsupervised and supervised machine translation. On XNLI, our
-approach pushes the state of the art by an absolute gain of 4.9% accuracy. On unsupervised machine translation, we
-obtain 34.3 BLEU on WMT'16 German-English, improving the previous state of the art by more than 9 BLEU. On supervised
-machine translation, we obtain a new state of the art of 38.5 BLEU on WMT'16 Romanian-English, outperforming the
-previous best approach by more than 4 BLEU. Our code and pretrained models will be made publicly available.*
+<hfoptions id="usage">
+<hfoption id="Pipeline">
 
-This model was contributed by [thomwolf](https://huggingface.co/thomwolf). The original code can be found [here](https://github.com/facebookresearch/XLM/).
+```python
+from transformers import pipeline
 
-## Usage tips
+# Initialize the pipeline with a multilingual XLM model
+unmasker = pipeline("fill-mask", model="xlm-mlm-en-2048")
 
-- XLM has many different checkpoints, which were trained using different objectives: CLM, MLM or TLM. Make sure to
-  select the correct objective for your task (e.g. MLM checkpoints are not suitable for generation).
-- XLM has multilingual checkpoints which leverage a specific `lang` parameter. Check out the [multi-lingual](../multilingual) page for more information.
-- A transformer model trained on several languages. There are three different type of training for this model and the library provides checkpoints for all of them:
+# Example in English
+result = unmasker("Hello, I'm a [MASK] model.")
+print(result)
 
-    * Causal language modeling (CLM) which is the traditional autoregressive training (so this model could be in the previous section as well). One of the languages is selected for each training sample, and the model input is a sentence of 256 tokens, that may span over several documents in one of those languages.
-    * Masked language modeling (MLM) which is like RoBERTa. One of the languages is selected for each training sample, and the model input is a sentence of 256 tokens, that may span over several documents in one of those languages, with dynamic masking of the tokens.
-    * A combination of MLM and translation language modeling (TLM). This consists of concatenating a sentence in two different languages, with random masking. To predict one of the masked tokens, the model can use both, the surrounding context in language 1 and the context given by language 2.
+# Example in French
+result = unmasker("Bonjour, je suis un modèle [MASK].")
+print(result)
+```
+
+</hfoption>
+<hfoption id="AutoModel">
+
+```python
+from transformers import AutoModelForMaskedLM, AutoTokenizer
+import torch
+
+# Load model and tokenizer
+model = AutoModelForMaskedLM.from_pretrained("xlm-mlm-en-2048")
+tokenizer = AutoTokenizer.from_pretrained("xlm-mlm-en-2048")
+
+# Prepare input
+text = "Hello, I'm a [MASK] model."
+inputs = tokenizer(text, return_tensors="pt")
+
+# Get prediction
+with torch.no_grad():
+    outputs = model(**inputs)
+    predictions = outputs.logits.argmax(dim=-1)
+
+# Decode prediction
+predicted_token = tokenizer.decode(predictions[0][inputs["input_ids"][0] == tokenizer.mask_token_id])
+print(f"Predicted token: {predicted_token}")
+```
+
+</hfoption>
+</hfoptions>
+
+## Model Details
+
+XLM was proposed in [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau. The model achieves state-of-the-art results on several cross-lingual tasks:
+
+- 4.9% absolute gain in accuracy on XNLI
+- 34.3 BLEU on WMT'16 German-English (unsupervised)
+- 38.5 BLEU on WMT'16 Romanian-English (supervised)
+
+### Key Features
+
+- Supports multiple languages through cross-lingual pretraining
+- Three training objectives:
+  - Causal language modeling (CLM) for autoregressive generation
+  - Masked language modeling (MLM) similar to BERT
+  - Translation language modeling (TLM) for parallel data
+- Multilingual checkpoints with language-specific parameters
+
+## Usage Tips
+
+- Choose the appropriate checkpoint based on your task (CLM for generation, MLM for understanding)
+- For multilingual tasks, use the `lang` parameter to specify the language
+- The model supports input sequences of up to 256 tokens
+- For TLM, the model can leverage context from both languages in parallel data
 
 ## Resources
 

From 538a5da2c60136774f234e08643aae265c23f941 Mon Sep 17 00:00:00 2001
From: AshAnand34 <aashishanand2019@gmail.com>
Date: Thu, 5 Jun 2025 16:11:04 -0700
Subject: [PATCH 2/3] Revised model card structure and content of XLM model

---
 docs/source/en/model_doc/xlm.md | 38 ++++++---------------------------
 1 file changed, 6 insertions(+), 32 deletions(-)

diff --git a/docs/source/en/model_doc/xlm.md b/docs/source/en/model_doc/xlm.md
index f78b072ee186..6aa5a7d00c50 100644
--- a/docs/source/en/model_doc/xlm.md
+++ b/docs/source/en/model_doc/xlm.md
@@ -14,8 +14,6 @@ rendered properly in your Markdown viewer.
 
 -->
 
-# XLM
-
 <div style="float: right;">
     <div class="flex flex-wrap space-x-1">
         <img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
@@ -23,14 +21,16 @@ rendered properly in your Markdown viewer.
     </div>
 </div>
 
-[XLM](https://arxiv.org/abs/1901.07291) is a transformer model pretrained using one of three objectives: causal language modeling (CLM), masked language modeling (MLM), or translation language modeling (TLM). What makes XLM unique is its ability to handle multiple languages through cross-lingual pretraining, achieving state-of-the-art results on cross-lingual classification and machine translation tasks.
+# XLM
+
+[XLM](https://huggingface.co/papers/1901.07291) demonstrates cross-lingual pretraining with two approaches, unsupervised training on a single language and supervised training on more than one language with a cross-lingual language model objective. The XLM model supports the causal language modeling objective, masked language modeling, and translation language modeling (an extension of the [BERT](./bert)) masked language modeling objective to multiple language inputs).
 
-You can find all the original XLM checkpoints under the [XLM](https://huggingface.co/models?search=xlm) collection.
+You can find all the original XLM checkpoints under the [Facebook AI community](https://huggingface.co/FacebookAI?search_models=xlm-mlm) organization.
 
 > [!TIP]
 > Click on the XLM models in the right sidebar for more examples of how to apply XLM to different cross-lingual tasks like classification, translation, and question answering.
 
-The example below demonstrates how to use XLM for masked language modeling with [`Pipeline`] or the [`AutoModel`] class.
+The example below demonstrates how to predict the `[MASK]` token with [`Pipeline`] or the [`AutoModel`] class.
 
 <hfoptions id="usage">
 <hfoption id="Pipeline">
@@ -78,39 +78,13 @@ print(f"Predicted token: {predicted_token}")
 </hfoption>
 </hfoptions>
 
-## Model Details
-
-XLM was proposed in [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau. The model achieves state-of-the-art results on several cross-lingual tasks:
-
-- 4.9% absolute gain in accuracy on XNLI
-- 34.3 BLEU on WMT'16 German-English (unsupervised)
-- 38.5 BLEU on WMT'16 Romanian-English (supervised)
-
-### Key Features
-
-- Supports multiple languages through cross-lingual pretraining
-- Three training objectives:
-  - Causal language modeling (CLM) for autoregressive generation
-  - Masked language modeling (MLM) similar to BERT
-  - Translation language modeling (TLM) for parallel data
-- Multilingual checkpoints with language-specific parameters
-
-## Usage Tips
+## Notes
 
 - Choose the appropriate checkpoint based on your task (CLM for generation, MLM for understanding)
 - For multilingual tasks, use the `lang` parameter to specify the language
 - The model supports input sequences of up to 256 tokens
 - For TLM, the model can leverage context from both languages in parallel data
 
-## Resources
-
-- [Text classification task guide](../tasks/sequence_classification)
-- [Token classification task guide](../tasks/token_classification)
-- [Question answering task guide](../tasks/question_answering)
-- [Causal language modeling task guide](../tasks/language_modeling)
-- [Masked language modeling task guide](../tasks/masked_language_modeling)
-- [Multiple choice task guide](../tasks/multiple_choice)
-
 ## XLMConfig
 
 [[autodoc]] XLMConfig

From d2d91c1ebc0c54b5a772c6101e8208d83c81206a Mon Sep 17 00:00:00 2001
From: AshAnand34 <aashishanand2019@gmail.com>
Date: Fri, 6 Jun 2025 23:44:40 -0700
Subject: [PATCH 3/3] Update XLM model documentation with improved examples and
 code snippets for predicting <mask> tokens using Pipeline and AutoModel.

---
 docs/source/en/model_doc/xlm.md | 61 ++++++++++++++++-----------------
 1 file changed, 29 insertions(+), 32 deletions(-)

diff --git a/docs/source/en/model_doc/xlm.md b/docs/source/en/model_doc/xlm.md
index 6aa5a7d00c50..f0c376625d14 100644
--- a/docs/source/en/model_doc/xlm.md
+++ b/docs/source/en/model_doc/xlm.md
@@ -30,60 +30,57 @@ You can find all the original XLM checkpoints under the [Facebook AI community](
 > [!TIP]
 > Click on the XLM models in the right sidebar for more examples of how to apply XLM to different cross-lingual tasks like classification, translation, and question answering.
 
-The example below demonstrates how to predict the `[MASK]` token with [`Pipeline`] or the [`AutoModel`] class.
+The example below demonstrates how to predict the `<mask>` token with [`Pipeline`], [`AutoModel`] and from the command line.
 
 <hfoptions id="usage">
 <hfoption id="Pipeline">
 
 ```python
-from transformers import pipeline
-
-# Initialize the pipeline with a multilingual XLM model
-unmasker = pipeline("fill-mask", model="xlm-mlm-en-2048")
-
-# Example in English
-result = unmasker("Hello, I'm a [MASK] model.")
-print(result)
-
-# Example in French
-result = unmasker("Bonjour, je suis un modèle [MASK].")
-print(result)
+import torch  
+from transformers import pipeline  
+
+pipeline = pipeline(  
+    task="fill-mask",  
+    model="facebook/xlm-roberta-xl",  
+    torch_dtype=torch.float16,  
+    device=0  
+)  
+pipeline("Bonjour, je suis un modèle <mask>.")
 ```
 
 </hfoption>
 <hfoption id="AutoModel">
 
 ```python
-from transformers import AutoModelForMaskedLM, AutoTokenizer
-import torch
-
-# Load model and tokenizer
-model = AutoModelForMaskedLM.from_pretrained("xlm-mlm-en-2048")
-tokenizer = AutoTokenizer.from_pretrained("xlm-mlm-en-2048")
-
-# Prepare input
-text = "Hello, I'm a [MASK] model."
-inputs = tokenizer(text, return_tensors="pt")
+import torch  
+from transformers import AutoModelForMaskedLM, AutoTokenizer  
+
+tokenizer = AutoTokenizer.from_pretrained(  
+    "FacebookAI/xlm-mlm-en-2048",  
+)  
+model = AutoModelForMaskedLM.from_pretrained(  
+    "FacebookAI/xlm-mlm-en-2048",  
+    torch_dtype=torch.float16,  
+    device_map="auto",  
+)  
+inputs = tokenizer("Hello, I'm a <mask> model.", return_tensors="pt").to("cuda")
 
-# Get prediction
 with torch.no_grad():
     outputs = model(**inputs)
     predictions = outputs.logits.argmax(dim=-1)
 
-# Decode prediction
 predicted_token = tokenizer.decode(predictions[0][inputs["input_ids"][0] == tokenizer.mask_token_id])
 print(f"Predicted token: {predicted_token}")
 ```
 
 </hfoption>
-</hfoptions>
-
-## Notes
+<hfoption id="transformers CLI">
 
-- Choose the appropriate checkpoint based on your task (CLM for generation, MLM for understanding)
-- For multilingual tasks, use the `lang` parameter to specify the language
-- The model supports input sequences of up to 256 tokens
-- For TLM, the model can leverage context from both languages in parallel data
+```bash
+echo -e "Plants create <mask> through a process known as photosynthesis." | transformers-cli run --task fill-mask --model FacebookAI/xlm-mlm-en-2048 --device 0
+```
+</hfoption>
+</hfoptions>
 
 ## XLMConfig