From bd48c55dc926bd5399d70e4769de5bfdbc20990e Mon Sep 17 00:00:00 2001 From: Drew Ross Date: Sat, 26 Jul 2025 18:08:47 -0500 Subject: [PATCH 1/3] Update mt5 model card --- docs/source/en/model_doc/mt5.md | 153 ++++++++++++++++++++++---------- 1 file changed, 107 insertions(+), 46 deletions(-) diff --git a/docs/source/en/model_doc/mt5.md b/docs/source/en/model_doc/mt5.md index d6b9ef99cb66..b6f4e700711d 100644 --- a/docs/source/en/model_doc/mt5.md +++ b/docs/source/en/model_doc/mt5.md @@ -14,54 +14,115 @@ rendered properly in your Markdown viewer. --> -# mT5 - -
-PyTorch -TensorFlow -Flax +
+
+ PyTorch + TensorFlow + Flax +
-## Overview - -The mT5 model was presented in [mT5: A massively multilingual pre-trained text-to-text transformer](https://huggingface.co/papers/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya -Siddhant, Aditya Barua, Colin Raffel. - -The abstract from the paper is the following: - -*The recent "Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain -state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a -multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail -the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual -benchmarks. We also describe a simple technique to prevent "accidental translation" in the zero-shot setting, where a -generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model -checkpoints used in this work are publicly available.* - -Note: mT5 was only pre-trained on [mC4](https://huggingface.co/datasets/mc4) excluding any supervised training. -Therefore, this model has to be fine-tuned before it is usable on a downstream task, unlike the original T5 model. -Since mT5 was pre-trained unsupervisedly, there's no real advantage to using a task prefix during single-task -fine-tuning. If you are doing multi-task fine-tuning, you should use a prefix. - -Google has released the following variants: - -- [google/mt5-small](https://huggingface.co/google/mt5-small) - -- [google/mt5-base](https://huggingface.co/google/mt5-base) - -- [google/mt5-large](https://huggingface.co/google/mt5-large) - -- [google/mt5-xl](https://huggingface.co/google/mt5-xl) - -- [google/mt5-xxl](https://huggingface.co/google/mt5-xxl). - -This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten). The original code can be -found [here](https://github.com/google-research/multilingual-t5). - -## Resources - -- [Translation task guide](../tasks/translation) -- [Summarization task guide](../tasks/summarization) +# MT5 + +[MT5](https://huggingface.co/papers/2010.11934) is an encoder-decoder transformer model for sequence-to-sequence tasks. It follows the same architecture and training procedure as [T5](.t5), a sequence-to-sequence model trained exclusively on English examples. In contrast, MT5 was trained from scratch on a multilingual dataset, allowing it to generalize across a wide range of languages. This leads to improved performance in a multilingual setting compared to T5. + +You can find all the original [MT5] checkpoints under the [MT5](https://huggingface.co/collections/google/mt5-release-65005f1a520f8d7b4d039509) collection. + +> [!TIP] +> This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten). +> +> Click on the MT5 models in the right sidebar for more examples of how to apply MT5 to different language tasks. + +The example below demonstrates how to summarize text with [`Pipeline`], [`AutoModel`], and from the command line. + + + + +```python +import torch +from transformers import pipeline + +pipeline = pipeline( + task="text2text-generation", + model="csebuetnlp/mT5_multilingual_XLSum", + torch_dtype=torch.float16, + device=0 +) +pipeline("""Plants are remarkable organisms that produce their own food using a method called photosynthesis. +This process involves converting sunlight, carbon dioxide, and water into glucose, which provides energy for growth. +Plants play a crucial role in sustaining life on Earth by generating oxygen and serving as the foundation of most ecosystems.""") +``` + + + + +```python +import torch +from transformers import AutoModelForSeq2SeqLM, AutoTokenizer + +tokenizer = AutoTokenizer.from_pretrained( + "csebuetnlp/mT5_multilingual_XLSum" +) +model = AutoModelForSeq2SeqLM.from_pretrained( + "csebuetnlp/mT5_multilingual_XLSum", + torch_dtype=torch.float16, + device_map="auto", +) + +input_text = """Plants are remarkable organisms that produce their own food using a method called photosynthesis. +This process involves converting sunlight, carbon dioxide, and water into glucose, which provides energy for growth. +Plants play a crucial role in sustaining life on Earth by generating oxygen and serving as the foundation of most ecosystems.""" +input_ids = tokenizer(input_text, return_tensors="pt").to("cuda") + +output = model.generate(**input_ids, cache_implementation="static") +print(tokenizer.decode(output[0], skip_special_tokens=True)) +``` + + + + +```bash +echo -e "Plants are remarkable organisms that produce their own food using a method called photosynthesis." | transformers-cli run --task text2text-generation --model csebuetnlp/mT5_multilingual_XLSum --device 0 +``` + + + + +Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends. + +The example below uses [bitsandbytes](../quantization/bitsandbytes) to only quantize the weights to int4. + +```python +import torch +from transformers import BitsAndBytesConfig, AutoModelForSeq2SeqLM, AutoTokenizer + +quantization_config = BitsAndBytesConfig( + load_in_4bit=True, + bnb_4bit_compute_dtype=torch.bfloat16, + bnb_4bit_quant_type="nf4" +) +model = AutoModelForSeq2SeqLM.from_pretrained( + "csebuetnlp/mT5_multilingual_XLSum", + torch_dtype=torch.bfloat16, + device_map="auto", + quantization_config=quantization_config +) + +tokenizer = AutoTokenizer.from_pretrained( + "csebuetnlp/mT5_multilingual_XLSum" +) +input_text = """Plants are remarkable organisms that produce their own food using a method called photosynthesis. +This process involves converting sunlight, carbon dioxide, and water into glucose, which provides energy for growth. +Plants play a crucial role in sustaining life on Earth by generating oxygen and serving as the foundation of most ecosystems.""" +input_ids = tokenizer(input_text, return_tensors="pt").to("cuda") + +output = model.generate(**input_ids, cache_implementation="static") +print(tokenizer.decode(output[0], skip_special_tokens=True)) +``` + +## Notes + +- MT5 was only pre-trained on [mc4](https://huggingface.co/datasets/mc4), meaning it must be fine-tuned before it can be used for downstream tasks. ## MT5Config From 4ff0c185cee69a0b29287aa732964dd4b9d4ee69 Mon Sep 17 00:00:00 2001 From: Drew Ross Date: Sat, 26 Jul 2025 18:42:24 -0500 Subject: [PATCH 2/3] Fix casing of model title --- docs/source/en/model_doc/mt5.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/source/en/model_doc/mt5.md b/docs/source/en/model_doc/mt5.md index b6f4e700711d..f11bd3d9f9d2 100644 --- a/docs/source/en/model_doc/mt5.md +++ b/docs/source/en/model_doc/mt5.md @@ -22,16 +22,16 @@ rendered properly in your Markdown viewer.
-# MT5 +# mT5 -[MT5](https://huggingface.co/papers/2010.11934) is an encoder-decoder transformer model for sequence-to-sequence tasks. It follows the same architecture and training procedure as [T5](.t5), a sequence-to-sequence model trained exclusively on English examples. In contrast, MT5 was trained from scratch on a multilingual dataset, allowing it to generalize across a wide range of languages. This leads to improved performance in a multilingual setting compared to T5. +[mT5](https://huggingface.co/papers/2010.11934) is an encoder-decoder transformer model for sequence-to-sequence tasks. It follows the same architecture and training procedure as [T5](.t5), a sequence-to-sequence model trained exclusively on English examples. In contrast, mT5 was trained from scratch on a multilingual dataset, allowing it to generalize across a wide range of languages. This leads to improved performance in a multilingual setting compared to T5. -You can find all the original [MT5] checkpoints under the [MT5](https://huggingface.co/collections/google/mt5-release-65005f1a520f8d7b4d039509) collection. +You can find all the original [mT5] checkpoints under the [mT5](https://huggingface.co/collections/google/mt5-release-65005f1a520f8d7b4d039509) collection. > [!TIP] > This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten). > -> Click on the MT5 models in the right sidebar for more examples of how to apply MT5 to different language tasks. +> Click on the mT5 models in the right sidebar for more examples of how to apply mT5 to different language tasks. The example below demonstrates how to summarize text with [`Pipeline`], [`AutoModel`], and from the command line. @@ -122,7 +122,7 @@ print(tokenizer.decode(output[0], skip_special_tokens=True)) ## Notes -- MT5 was only pre-trained on [mc4](https://huggingface.co/datasets/mc4), meaning it must be fine-tuned before it can be used for downstream tasks. +- mT5 was only pre-trained on [mc4](https://huggingface.co/datasets/mc4), meaning it must be fine-tuned before it can be used for downstream tasks. ## MT5Config From 342f800964b96ef3d32fef213f89bdb2d8bf8c75 Mon Sep 17 00:00:00 2001 From: Drew Ross Date: Tue, 29 Jul 2025 16:02:03 -0500 Subject: [PATCH 3/3] Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --- docs/source/en/model_doc/mt5.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/source/en/model_doc/mt5.md b/docs/source/en/model_doc/mt5.md index f11bd3d9f9d2..2796c96eb8a0 100644 --- a/docs/source/en/model_doc/mt5.md +++ b/docs/source/en/model_doc/mt5.md @@ -24,7 +24,7 @@ rendered properly in your Markdown viewer. # mT5 -[mT5](https://huggingface.co/papers/2010.11934) is an encoder-decoder transformer model for sequence-to-sequence tasks. It follows the same architecture and training procedure as [T5](.t5), a sequence-to-sequence model trained exclusively on English examples. In contrast, mT5 was trained from scratch on a multilingual dataset, allowing it to generalize across a wide range of languages. This leads to improved performance in a multilingual setting compared to T5. +[mT5](https://huggingface.co/papers/2010.11934) is a multilingual variant of [T5](./t5), training on 101 languages. It also incorporates a new "accidental translation" technique to prevent the model from incorrectly translating predictions into the wrong language. You can find all the original [mT5] checkpoints under the [mT5](https://huggingface.co/collections/google/mt5-release-65005f1a520f8d7b4d039509) collection. @@ -79,10 +79,10 @@ print(tokenizer.decode(output[0], skip_special_tokens=True)) ``` - + ```bash -echo -e "Plants are remarkable organisms that produce their own food using a method called photosynthesis." | transformers-cli run --task text2text-generation --model csebuetnlp/mT5_multilingual_XLSum --device 0 +echo -e "Plants are remarkable organisms that produce their own food using a method called photosynthesis." | transformers run --task text2text-generation --model csebuetnlp/mT5_multilingual_XLSum --device 0 ``` @@ -122,7 +122,7 @@ print(tokenizer.decode(output[0], skip_special_tokens=True)) ## Notes -- mT5 was only pre-trained on [mc4](https://huggingface.co/datasets/mc4), meaning it must be fine-tuned before it can be used for downstream tasks. +- mT5 must be fine-tuned for downstream tasks because it was only pretrained on the [mc4](https://huggingface.co/datasets/mc4) dataset. ## MT5Config