docs: Update LayoutLMv3 model card with standardized format and impro… by carrycooldude · Pull Request #37155 · huggingface/transformers

carrycooldude · 2025-03-31T18:33:29Z

Update LayoutLMv3 Model Card Documentation

This PR updates the LayoutLMv3 model card documentation to follow the standardized format as requested in #36979. The changes improve the documentation's clarity and usability while maintaining consistency with other model cards in the repository.

What does this PR do?

This PR enhances the LayoutLMv3 model card documentation by:

Adding badges for framework support (PyTorch, TensorFlow, Flax) and optimizations (Flash Attention, SDPA)
Reorganizing code examples into clear sections:
- Quick Start (basic usage)
- Pipeline API examples
- AutoModel examples
- transformers-cli examples
Adding quantization examples for large models (8-bit and 4-bit)
Adding attention visualization examples using AttentionMaskVisualizer
Maintaining existing functionality while improving documentation structure

The changes make the documentation more accessible and provide ready-to-use examples for different use cases, following the standardized format used in other model cards like Gemma 3, PaliGemma, and ViT.

#36979

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Since this is a documentation update for a vision-language model, I would suggest tagging:

@amyeroberts (vision models)
@stevhliu (documentation)

…ved examples

github-actions · 2025-03-31T18:33:40Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

stevhliu

Thanks, this is a good start! Please refer to the Gemma 3 docs see how to standardize this doc 🤗

stevhliu · 2025-04-01T17:17:31Z


 -->

+[![PyTorch](https://img.shields.io/badge/PyTorch-1.12+-blue.svg)](https://pytorch.org/get-started/locally/)


Please style these with the <div> tags. You can copy it from one of the existing updated model cards on main like Gemma 3.

stevhliu · 2025-04-01T17:30:32Z

 # LayoutLMv3

-## Overview
+LayoutLMv3 is a powerful multimodal transformer model designed specifically for Document AI tasks. What makes it unique is its unified approach to handling both text and images in documents, using a simple yet effective architecture that combines patch embeddings with transformer layers. Unlike its predecessor LayoutLMv2, it uses a more streamlined approach with patch embeddings (similar to ViT) instead of a CNN backbone.


Suggested change

LayoutLMv3 is a powerful multimodal transformer model designed specifically for Document AI tasks. What makes it unique is its unified approach to handling both text and images in documents, using a simple yet effective architecture that combines patch embeddings with transformer layers. Unlike its predecessor LayoutLMv2, it uses a more streamlined approach with patch embeddings (similar to ViT) instead of a CNN backbone.

[LayoutLMv3](https://huggingface.co/papers/2204.08387) is a multimodal transformer model designed specifically for Document AI tasks. It unites the pretraining objective for text and images, masked language and masked image modeling, and also includes a word-patch alignment objective for even stronger text and image alignment. The model architecture is also unified and uses a more streamlined approach with patch embeddings (similar to [ViT](./vit)) instead of a CNN backbone.

Not fully resolved yet, missing link to the model

stevhliu · 2025-04-01T17:33:13Z

+<Tip>
+Click on the right sidebar for more examples of how to use the model for different tasks!
+</Tip>


Suggested change

<Tip>

Click on the right sidebar for more examples of how to use the model for different tasks!

</Tip>

> [!TIP]

> Click on the LayoutLMv3 models in the right sidebar for more examples of how to apply LayoutLMv3 to different vision and language tasks.

Not resolved yet either

stevhliu · 2025-04-01T17:40:15Z

+outputs = model(**encoding)
+```
+
+## Using transformers-cli


We can remove this since transformers-cli doesn't support image inputs

stevhliu · 2025-04-01T17:40:59Z

+## Quantization
+
+For large models, you can use quantization to reduce memory usage:


Update the code example below accordingly

Suggested change

## Quantization

For large models, you can use quantization to reduce memory usage:

Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](https://huggingface.co/docs/transformers/main/en/quantization/overview) overview for more available quantization backends.

The example below uses [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao) to only quantize the weights to int4.

Not resolved yet. You only need to show quantization for either 8 or 4-bits instead of both. Also the code for quantizing the model is incorrect.

stevhliu · 2025-04-01T17:42:02Z

 - [Document question answering task guide](../tasks/document_question_answering)

-## LayoutLMv3Config
+## Model Details


The rest of these changes should be reverted

Not resolved yet, the ## Model Details is still there as are the changes to the header levels of the LayoutLMv3 classes

carrycooldude · 2025-04-03T08:54:13Z

@stevhliu , have a look on this

HuggingFaceDocBuilderDev · 2025-04-03T16:47:46Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

stevhliu · 2025-04-03T17:23:17Z

Hey, it is still a bit off! For example:

the badges look different and should be aligned on the right
the code examples should be inside <hfoption> blocks so users can easily toggle between Pipeline and AutoModel
the Resources section should be removed

I suggest taking a look at the Gemma 3 model card again and trying to align your model card with it as much as possible!

stevhliu

There are a lot of unresolved changes, so please don't mark them as resolved 😅

stevhliu · 2025-04-04T18:21:56Z

 # LayoutLMv3

-## Overview
+LayoutLMv3 is a powerful multimodal transformer model designed specifically for Document AI tasks. What makes it unique is its unified approach to handling both text and images in documents, using a simple yet effective architecture that combines patch embeddings with transformer layers. Unlike its predecessor LayoutLMv2, it uses a more streamlined approach with patch embeddings (similar to ViT) instead of a CNN backbone.


Not fully resolved yet, missing link to the model

stevhliu · 2025-04-04T18:22:46Z

+This unified architecture and training approach makes LayoutLMv3 particularly effective for both text-centric tasks (like form understanding and receipt analysis) and image-centric tasks (like document classification and layout analysis).

-*Self-supervised pre-training techniques have achieved remarkable progress in Document AI. Most multimodal pre-trained models use a masked language modeling objective to learn bidirectional representations on the text modality, but they differ in pre-training objectives for the image modality. This discrepancy adds difficulty to multimodal representation learning. In this paper, we propose LayoutLMv3 to pre-train multimodal Transformers for Document AI with unified text and image masking. Additionally, LayoutLMv3 is pre-trained with a word-patch alignment objective to learn cross-modal alignment by predicting whether the corresponding image patch of a text word is masked. The simple unified architecture and training objectives make LayoutLMv3 a general-purpose pre-trained model for both text-centric and image-centric Document AI tasks. Experimental results show that LayoutLMv3 achieves state-of-the-art performance not only in text-centric tasks, including form understanding, receipt understanding, and document visual question answering, but also in image-centric tasks such as document image classification and document layout analysis.*
+[Paper](https://arxiv.org/abs/2204.08387) | [Official Checkpoints](https://huggingface.co/microsoft/layoutlmv3-base)


Suggested change

[Paper](https://arxiv.org/abs/2204.08387) | [Official Checkpoints](https://huggingface.co/microsoft/layoutlmv3-base)

You can find all the original LayoutLMv3 checkpoints under the [LayoutLM](https://huggingface.co/collections/microsoft/layoutlm-6564539601de72cb631d0902) collection.

stevhliu · 2025-04-04T18:23:18Z

+<Tip>
+Click on the right sidebar for more examples of how to use the model for different tasks!
+</Tip>


Not resolved yet either

stevhliu · 2025-04-04T18:29:05Z

 - [Document question answering task guide](../tasks/document_question_answering)

-## LayoutLMv3Config
+## Model Details


Not resolved yet, the ## Model Details is still there as are the changes to the header levels of the LayoutLMv3 classes

stevhliu · 2025-04-04T18:30:53Z

+## Quantization
+
+For large models, you can use quantization to reduce memory usage:


Not resolved yet. You only need to show quantization for either 8 or 4-bits instead of both. Also the code for quantizing the model is incorrect.

stevhliu · 2025-04-04T18:31:15Z

+outputs = model(**encoding)
+```
+
+## Using transformers-cli


stevhliu · 2025-04-04T18:31:44Z

+result = token_classifier("form.jpg")
+
+# For question answering
+qa = pipeline("document-question-answering", model="microsoft/layoutlmv3-base")


stevhliu · 2025-04-04T18:32:07Z

+## Using the Pipeline
+
+The easiest way to use LayoutLMv3 is through the pipeline API:


Unresolved as there are still other examples here besides question answering

stevhliu · 2025-04-04T18:32:51Z

+
+## Quick Start
+
+Here's a quick example of how to use LayoutLMv3 for document understanding:


stevhliu · 2025-04-08T15:58:09Z

We'll need to update the badges to include FlashAttention and the code examples to include SDPA once #35469 is merged!

carrycooldude · 2025-04-09T14:42:12Z

We'll need to update the badges to include FlashAttention and the code examples to include SDPA once #35469 is merged!

Sure , Will see to that too

docs: Update LayoutLMv3 model card with standardized format and impro…

0d06915

…ved examples

github-actions Bot marked this pull request as draft March 31, 2025 18:33

carrycooldude marked this pull request as ready for review March 31, 2025 18:39

github-actions Bot requested a review from stevhliu March 31, 2025 18:39

stevhliu mentioned this pull request Mar 31, 2025

[Community contributions] Model cards #36979

Closed

stevhliu reviewed Apr 1, 2025

View reviewed changes

Added done

5b92ea6

carrycooldude force-pushed the feature/update-layoutlmv3-doc branch from 819c757 to 5b92ea6 Compare April 3, 2025 07:47

added __call__

b0aeeec

carrycooldude force-pushed the feature/update-layoutlmv3-doc branch from 9368ed6 to b0aeeec Compare April 3, 2025 08:23

added feature_extractor

294e6e9

carrycooldude force-pushed the feature/update-layoutlmv3-doc branch from b15eb3d to 294e6e9 Compare April 3, 2025 08:31

updated the doc by taking gemma3 reference

61f22d5

carrycooldude force-pushed the feature/update-layoutlmv3-doc branch from 7836f29 to 61f22d5 Compare April 3, 2025 18:41

carrycooldude added 3 commits April 4, 2025 00:11

Merge branch 'main' into feature/update-layoutlmv3-doc

c0107d7

Merge branch 'main' into feature/update-layoutlmv3-doc

f90cd6f

Merge branch 'main' into feature/update-layoutlmv3-doc

9923d5a

stevhliu reviewed Apr 4, 2025

View reviewed changes

carrycooldude added 2 commits April 9, 2025 15:07

Update LayoutLMv3 documentation

004f81d

Merge branch 'main' into feature/update-layoutlmv3-doc

134efba


		-->

		[![PyTorch](https://img.shields.io/badge/PyTorch-1.12+-blue.svg)](https://pytorch.org/get-started/locally/)

-<Tip>
-Click on the right sidebar for more examples of how to use the model for different tasks!
-</Tip>
+> [!TIP]
+> Click on the LayoutLMv3 models in the right sidebar for more examples of how to apply LayoutLMv3 to different vision and language tasks.

		## Quantization

		For large models, you can use quantization to reduce memory usage:

	[Paper](https://arxiv.org/abs/2204.08387) \| [Official Checkpoints](https://huggingface.co/microsoft/layoutlmv3-base)
	You can find all the original LayoutLMv3 checkpoints under the [LayoutLM](https://huggingface.co/collections/microsoft/layoutlm-6564539601de72cb631d0902) collection.

		## Using the Pipeline

		The easiest way to use LayoutLMv3 is through the pipeline API:


		## Quick Start

		Here's a quick example of how to use LayoutLMv3 for document understanding:

Conversation

carrycooldude commented Mar 31, 2025 • edited by stevhliu Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Update LayoutLMv3 Model Card Documentation

What does this PR do?

Before submitting

Who can review?

Uh oh!

github-actions Bot commented Mar 31, 2025

Uh oh!

stevhliu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

carrycooldude commented Apr 3, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Apr 3, 2025

Uh oh!

stevhliu commented Apr 3, 2025

Uh oh!

stevhliu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevhliu commented Apr 8, 2025

Uh oh!

carrycooldude commented Apr 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

carrycooldude commented Mar 31, 2025 •

edited by stevhliu

Loading