README: Update Bert Japanese model card by KeshavSingh29 · Pull Request #39466 · huggingface/transformers

KeshavSingh29 · 2025-07-17T07:17:25Z

What does this PR do?

As mentioned in #36979 , contributing to HF model cards, specifically to Bert Japanese.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@stevhliu

stevhliu

Thanks!

stevhliu · 2025-07-17T16:23:22Z

-<img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAC0AAAAtCAMAAAANxBKoAAAC7lBMVEUAAADg5vYHPVgAoJH+/v76+v39/f9JbLP///9+AIgAnY3///+mcqzt8fXy9fgkXa3Ax9709fr+///9/f8qXq49qp5AaLGMwrv8/P0eW60VWawxYq8yqJzG2dytt9Wyu9elzci519Lf3O3S2efY3OrY0+Xp7PT///////+dqNCexMc6Z7AGpJeGvbenstPZ5ejQ1OfJzOLa7ejh4+/r8fT29vpccbklWK8PVa0AS6ghW63O498vYa+lsdKz1NDRt9Kw1c672tbD3tnAxt7R6OHp5vDe7OrDyuDn6vLl6/EAQKak0MgATakkppo3ZK/Bz9y8w9yzu9jey97axdvHzeG21NHH4trTwthKZrVGZLSUSpuPQJiGAI+GAI8SWKydycLL4d7f2OTi1+S9xNzL0ePT6OLGzeEAo5U0qJw/aLEAo5JFa7JBabEAp5Y4qZ2QxLyKmsm3kL2xoMOehrRNb7RIbbOZgrGre68AUqwAqZqNN5aKJ5N/lMq+qsd8kMa4pcWzh7muhLMEV69juq2kbKqgUaOTR5uMMZWLLZSGAI5VAIdEAH+ovNDHuNCnxcy3qcaYx8K8msGplrx+wLahjbYdXrV6vbMvYK9DrZ8QrZ8tqJuFms+Sos6sw8ecy8RffsNVeMCvmb43aLltv7Q4Y7EZWK4QWa1gt6meZKUdr6GOAZVeA4xPAISyveLUwtivxtKTpNJ2jcqfvcltiMiwwcfAoMVxhL+Kx7xjdrqTe60tsaNQs6KaRKACrJ6UTZwkqpqTL5pkHY4AloSgsd2ptNXPvNOOncuxxsqFl8lmg8apt8FJcr9EbryGxLqlkrkrY7dRa7ZGZLQ5t6iXUZ6PPpgVpZeJCJFKAIGareTa0+KJod3H0deY2M+esM25usmYu8d2zsJOdcBVvrCLbqcAOaaHaKQAMaScWqKBXqCXMJ2RHpiLF5NmJZAdAHN2kta11dKu1M+DkcZLdb+Mcql3TppyRJdzQ5ZtNZNlIY+DF4+voCOQAAAAZ3RSTlMABAT+MEEJ/RH+/TP+Zlv+pUo6Ifz8+fco/fz6+evr39S9nJmOilQaF/7+/f38+smmoYp6b1T+/v7++vj189zU0tDJxsGzsrKSfv34+Pf27dDOysG9t6+n/vv6+vr59uzr1tG+tZ6Qg9Ym3QAABR5JREFUSMeNlVVUG1EQhpcuxEspXqS0SKEtxQp1d3d332STTRpIQhIISQgJhODu7lAoDoUCpe7u7u7+1puGpqnCPOyZvffbOXPm/PsP9JfQgyCC+tmTABTOcbxDz/heENS7/1F+9nhvkHePG0wNDLbGWwdXL+rbLWvpmZHXD8+gMfBjTh+aSe6Gnn7lwQIOTR0c8wfX3PWgv7avbdKwf/ZoBp1Gp/PvuvXW3vw5ib7emnTW4OR+3D4jB9vjNJ/7gNvfWWeH/TO/JyYrsiKCRjVEZA3UB+96kON+DxOQ/NLE8PE5iUYgIXjFnCOlxEQMaSGVxjg4gxOnEycGz8bptuNjVx08LscIgrzH3umcn+KKtiBIyvzOO2O99aAdR8cF19oZalnCtvREUw79tCd5sow1g1UKM6kXqUx4T8wsi3sTjJ3yzDmmhenLXLpo8u45eG5y4Vvbk6kkC4LLtJMowkSQxmk4ggVJEG+7c6QpHT8vvW9X7/o7+3ELmiJi2mEzZJiz8cT6TBlanBk70cB5GGIGC1gRDdZ00yADLW1FL6gqhtvNXNG5S9gdSrk4M1qu7JAsmYshzDS4peoMrU/gT7qQdqYGZaYhxZmVbGJAm/CS/HloWyhRUlknQ9KYcExTwS80d3VNOxUZJpITYyspl0LbhArhpZCD9cRWEQuhYkNGMHToQ/2Cs6swJlb39CsllxdXX6IUKh/H5jbnSsPKjgmoaFQ1f8wRLR0UnGE/RcDEjj2jXG1WVTwUs8+zxfcrVO+vSsuOpVKxCfYZiQ0/aPKuxQbQ8lIz+DClxC8u+snlcJ7Yr1z1JPqUH0V+GDXbOwAib931Y4Imaq0NTIXPXY+N5L18GJ37SVWu+hwXff8l72Ds9XuwYIBaXPq6Shm4l+Vl/5QiOlV+uTk6YR9PxKsI9xNJny31ygK1e+nIRC1N97EGkFPI+jCpiHe5PCEy7oWqWSwRrpOvhFzcbTWMbm3ZJAOn1rUKpYIt/lDhW/5RHHteeWFN60qo98YJuoq1nK3uW5AabyspC1BcIEpOhft+SZAShYoLSvnmSfnYADUERP5jJn2h5XtsgCRuhYQqAvwTwn33+YWEKUI72HX5AtfSAZDe8F2DtPPm77afhl0EkthzuCQU0BWApgQIH9+KB0JhopMM7bJrdTRoleM2JAVNMyPF+wdoaz+XJpGoVAQ7WXUkcV7gT3oUZyi/ISIJAVKhgNp+4b4veCFhYVJw4locdSjZCp9cPUhLF9EZ3KKzURepMEtCDPP3VcWFx4UIiZIklIpFNfHpdEafIF2aRmOcrUmjohbT2WUllbmRvgfbythbQO3222fpDJoufaQPncYYuqoGtUEsCJZL6/3PR5b4syeSjZMQG/T2maGANlXT2v8S4AULWaUkCxfLyW8iW4kdka+nEMjxpL2NCwsYNBp+Q61PF43zyDg9Bm9+3NNySn78jMZUUkumqE4Gp7JmFOdP1vc8PpRrzj9+wPinCy8K1PiJ4aYbnTYpCCbDkBSbzhu2QJ1Gd82t8jI8TH51+OzvXoWbnXUOBkNW+0mWFwGcGOUVpU81/n3TOHb5oMt2FgYGjzau0Nif0Ss7Q3XB33hjjQHjHA5E5aOyIQc8CBrLdQSs3j92VG+3nNEjbkbdbBr9zm04ruvw37vh0QKOdeGIkckc80fX3KH/h7PT4BOjgCty8VZ5ux1MoO5Cf5naca2LAsEgehI+drX8o/0Nu+W0m6K/I9gGPd/dfx/EN/wN62AhsBWuAAAAAElFTkSuQmCC
-">
-</div>
+BertJapanese is a bidirectional transformer model that keeps the same architecture as original BERT and keeps the same learning objective, i.e., to predict masked tokens in a sentence and to predict whether one sentence follows another. While the architecture is same, BertJapanese relies on specific tokenization methods (wordpiece / character) that are more suitable for Japanese text. 


Suggested change

BertJapanese is a bidirectional transformer model that keeps the same architecture as original BERT and keeps the same learning objective, i.e., to predict masked tokens in a sentence and to predict whether one sentence follows another. While the architecture is same, BertJapanese relies on specific tokenization methods (wordpiece / character) that are more suitable for Japanese text.

BertJapanese is a [BERT](./bert) model pretrained on Japanese text. It uses the MeCab and WordPiece tokenizers or character tokenization.

You can find all the original BERTJapanese checkpoints under the [tohoku-nlp](https://huggingface.co/tohoku-nlp) organization.

> [!TIP]

> This model was contributed by [tohoku-nlp](https://huggingface.co/tohoku-nlp).

>

> Refer to the [BERT](./bert] docs for usage examples.

Thanks for the comments / edits @stevhliu
div tag on line 24 is still needed.
Apart from that I have made the changes.

stevhliu · 2025-07-17T16:23:29Z

-</div>
+BertJapanese is a bidirectional transformer model that keeps the same architecture as original BERT and keeps the same learning objective, i.e., to predict masked tokens in a sentence and to predict whether one sentence follows another. While the architecture is same, BertJapanese relies on specific tokenization methods (wordpiece / character) that are more suitable for Japanese text. 
+
+Check Notes for additional details.


Suggested change

Check Notes for additional details.

stevhliu · 2025-07-17T16:23:40Z

+
+Check Notes for additional details.
+
+The example below demonstrates how to predict the [MASK] token with [`Pipeline`] or the [`AutoModel`] class using model with MeCab and WordPiece tokenization.


Suggested change

The example below demonstrates how to predict the [MASK] token with [`Pipeline`] or the [`AutoModel`] class using model with MeCab and WordPiece tokenization.

The example below demonstrates how to predict the [MASK] token with MeCab and WordPiece tokenization with [`Pipeline`], [`AutoModel`], and from the command line.

stevhliu · 2025-07-17T16:24:13Z

+> [!TIP]
+> Note that this is the base model, you need to add a task specific head and further fine-tune it to make sure you get accurate results. 


Suggested change

> [!TIP]

> Note that this is the base model, you need to add a task specific head and further fine-tune it to make sure you get accurate results.

stevhliu · 2025-07-17T16:32:46Z


-Example of using a model with MeCab and WordPiece tokenization:
+```py
+import torch


import torch from transformers import AutoModelForMaskedLM, AutoTokenizer model = AutoModelForMaskedLM.from_pretrained("tohoku-nlp/bert-base-japanese") tokenizer = AutoTokenizer.from_pretrained("tohoku-nlp/bert-base-japanese", torch_dtype=torch.float16, device_map="auto") text = "今日は[MASK]天気ですね。" inputs = tokenizer(text, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) predictions = outputs.logits masked_index = torch.where(inputs['input_ids'] == tokenizer.mask_token_id)[1] predicted_token_id = predictions[0, masked_index].argmax(dim=-1) predicted_token = tokenizer.decode(predicted_token_id) print(f"The predicted token is: {predicted_token}")

stevhliu · 2025-07-17T16:33:04Z

-```python
->>> bertjapanese = AutoModel.from_pretrained("cl-tohoku/bert-base-japanese-char")
->>> tokenizer = AutoTokenizer.from_pretrained("cl-tohoku/bert-base-japanese-char")
+<hfoption id="transformers-cli"></hfoption>


Add command line example

stevhliu · 2025-07-17T16:33:26Z

 </Tip>


+## BertConfig


Don't need to change any of the API references here

@stevhliu In the original doc for bert_japanese
only the tokenizer ref. is mentioned. Would you like to keep the status quo here?
For ref:

## BertJapaneseTokenizer [[autodoc]] BertJapaneseTokenizer

Yeah lets keep it how it is originally

stevhliu · 2025-07-17T16:34:30Z

->>> ## Input Japanese Text
->>> line = "吾輩は猫である。"
+## Notes
+- The model architecture(same as original BERT by Google) comes in two variants: 


Remove all these notes and replace with an example of character tokenization.

Suggested change

- The model architecture(same as original BERT by Google) comes in two variants:

- The example below demonstrates character tokenization.

```py

add code example here

stevhliu · 2025-07-17T16:34:39Z


-## Overview
+<hfoptions id="usage">
+<hfoption id="Pipeline">


pip install transformers["ja"] import torch from transformers import pipeline pipeline = pipeline("fill-mask", model="tohoku-nlp/bert-base-japanese", torch_dtype=torch.float16, device=0) pipeline("今日は[MASK]天気ですね。")

README: Update Bert Japanese model card

d3120f9

stevhliu mentioned this pull request Jul 17, 2025

[Community contributions] Model cards #36979

Closed

stevhliu reviewed Jul 17, 2025

View reviewed changes

-BertJapanese is a bidirectional transformer model that keeps the same architecture as original BERT and keeps the same learning objective, i.e., to predict masked tokens in a sentence and to predict whether one sentence follows another. While the architecture is same, BertJapanese relies on specific tokenization methods (wordpiece / character) that are more suitable for Japanese text.
+BertJapanese is a [BERT](./bert) model pretrained on Japanese text. It uses the MeCab and WordPiece tokenizers or character tokenization.
+You can find all the original BERTJapanese checkpoints under the [tohoku-nlp](https://huggingface.co/tohoku-nlp) organization.
+> [!TIP]
+> This model was contributed by [tohoku-nlp](https://huggingface.co/tohoku-nlp).
+>
+> Refer to the [BERT](./bert] docs for usage examples.


		Check Notes for additional details.

		The example below demonstrates how to predict the [MASK] token with [`Pipeline`] or the [`AutoModel`] class using model with MeCab and WordPiece tokenization.

		> [!TIP]
		> Note that this is the base model, you need to add a task specific head and further fine-tune it to make sure you get accurate results.

-- The model architecture(same as original BERT by Google) comes in two variants:
+- The example below demonstrates character tokenization.
+   ```py
+   add code example here

Conversation

KeshavSingh29 commented Jul 17, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

stevhliu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KeshavSingh29 Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KeshavSingh29 Aug 6, 2025 •

edited

Loading