docs(readme): update convergence table, latest news, and outdated links#2638
docs(readme): update convergence table, latest news, and outdated links#2638sbhavani wants to merge 5 commits intoNVIDIA:mainfrom
Conversation
Greptile SummaryThis PR updates Key observations:
Confidence Score: 4/5
Important Files Changed
Flowchartflowchart TD
A[README.rst Changes] --> B[Latest News Section]
A --> C[What is TE? Description]
A --> D[Highlights Section]
A --> E[NGC Container Version]
A --> F[FP8 Convergence Table]
A --> G[Integrations List]
B --> B1["+ Nemotron 3 paper (12/2025)\n NVFP4 on Transformer Engine"]
C --> C1["+ MXFP8 / NVFP4 support\n mentioned for Blackwell GPUs"]
D --> D1["+ MXFP8 and NVFP4 bullet\n for Blackwell GPUs"]
E --> E1["Updated to 26.01\n (January 2026 release)"]
F --> F2["- Removed: T5-770M (JAX/T5x)\n- Removed: GPT-5B (JAX/Paxml)\n- Removed: GPT-5B (NeMo, on-request)\n- Removed: T5-11B (JAX/T5x, on-request)\n- Removed: GPT-22B (NeMo, on-request)\n- Removed: GPT-175B (JAX/Paxml)"]
F --> F3["+ Added: LLM-8B (Megatron Core)\n+ Added: MoE-16B (Megatron Core)"]
G --> G1["DeepSpeed: specific test file → root repo\nLightning: GitHub issue → proper docs\n- Removed: Colossal-AI (Coming soon)\n- Removed: PeriFlow (Coming soon)\nNanotron: removed Coming soon tag"]
Last reviewed commit: f0bd464 |
README.rst
Outdated
| loss = out.sum() | ||
| loss.backward() | ||
|
|
||
| For a tutorial with more details, see the `Quickstart Notebook <https://github.com/NVIDIA/TransformerEngine/blob/main/docs/examples/quickstart.ipynb>`_. |
There was a problem hiding this comment.
The referenced quickstart.ipynb file does not exist in docs/examples/. The actual notebooks in that directory are fp8_primer.ipynb, advanced_optimizations.ipynb, and te_jax_integration.ipynb. Consider using one of these existing notebooks or creating the quickstart notebook before merging.
3f01d10 to
98726c5
Compare
README.rst
Outdated
| | LLM-8B | Megatron Core | https://arxiv.org/abs/2506.08027 | | ||
| +------------+------------------+---------------------------------------------------------------------------------------------------------+ | ||
| | MPT-13B | Mosaic Composer | https://www.databricks.com/blog/turbocharged-training-optimizing-databricks-mosaic-ai-stack-fp8 | | ||
| +------------+------------------+---------------------------------------------------------------------------------------------------------+ | ||
| | GPT-22B | NeMo Framework | Available on request | | ||
| | MoE-16B | Megatron Core | https://arxiv.org/abs/2506.08027 | |
There was a problem hiding this comment.
extra whitespace in "Megatron Core" framework names - inconsistent with other rows
| | LLM-8B | Megatron Core | https://arxiv.org/abs/2506.08027 | | |
| +------------+------------------+---------------------------------------------------------------------------------------------------------+ | |
| | MPT-13B | Mosaic Composer | https://www.databricks.com/blog/turbocharged-training-optimizing-databricks-mosaic-ai-stack-fp8 | | |
| +------------+------------------+---------------------------------------------------------------------------------------------------------+ | |
| | GPT-22B | NeMo Framework | Available on request | | |
| | MoE-16B | Megatron Core | https://arxiv.org/abs/2506.08027 | | |
| | LLM-8B | Megatron Core | https://arxiv.org/abs/2506.08027 | | |
| +------------+------------------+---------------------------------------------------------------------------------------------------------+ | |
| | MPT-13B | Mosaic Composer | https://www.databricks.com/blog/turbocharged-training-optimizing-databricks-mosaic-ai-stack-fp8 | | |
| +------------+------------------+---------------------------------------------------------------------------------------------------------+ | |
| | MoE-16B | Megatron Core | https://arxiv.org/abs/2506.08027 | |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| | LLama2-7B | Alibaba Pai | https://mp.weixin.qq.com/s/NQT0uKXLbXyh5031zBdeBQ | | ||
| +------------+------------------+---------------------------------------------------------------------------------------------------------+ | ||
| | T5-11B | JAX/T5x | Available on request | | ||
| | LLM-8B | Megatron Core | https://arxiv.org/abs/2506.08027 | |
There was a problem hiding this comment.
extra space before "Megatron Core" - should have 2 spaces like other rows, not 3
| | LLM-8B | Megatron Core | https://arxiv.org/abs/2506.08027 | | |
| | LLM-8B | Megatron Core | https://arxiv.org/abs/2506.08027 | |
| | MPT-13B | Mosaic Composer | https://www.databricks.com/blog/turbocharged-training-optimizing-databricks-mosaic-ai-stack-fp8 | | ||
| +------------+------------------+---------------------------------------------------------------------------------------------------------+ | ||
| | GPT-22B | NeMo Framework | Available on request | | ||
| | MoE-16B | Megatron Core | https://arxiv.org/abs/2506.08027 | |
There was a problem hiding this comment.
extra space before "Megatron Core" - should have 2 spaces like other rows, not 3
| | MoE-16B | Megatron Core | https://arxiv.org/abs/2506.08027 | | |
| | MoE-16B | Megatron Core | https://arxiv.org/abs/2506.08027 | |
…t info - Add MXFP8 and NVFP4 format support to highlights and description - Update FP8 convergence table with MXFP8 results from arxiv paper - Remove outdated JAX-Toolbox links and "available on request" entries - Update Docker container versions to 26.01 - Fix DeepSpeed and Lightning integration links - Add Nemotron 3 paper to Latest News - Add quickstart notebook link after PyTorch example Signed-off-by: Santosh Bhavani <santosh.bhavani@live.com>
- Replace quickstart.ipynb link with fp8_primer.ipynb (file exists) - Fix extra whitespace in Megatron Core table rows Signed-off-by: Santosh Bhavani <santosh.bhavani@live.com>
90af7f4 to
d5922a3
Compare
Signed-off-by: Santosh Bhavani <santosh.bhavani@live.com>
cfb045e to
52deeda
Compare
Additional Comments (1)
This link uses |
This reverts commit 643b3d9.
Additional Comments (1)
The PR description states the convergence table is being "updated with MXFP8 results from arxiv paper" (
If the new entries represent MXFP8 convergence (not standard FP8), readers may be misled about what precision those Megatron Core results actually demonstrate. Consider either:
|
Description
Updates README add missing format support documentation, update news section, and fix broken/outdated links.
Type of change
Changes
Checklist: