Skip to content

docs(readme): update convergence table, latest news, and outdated links#2638

Open
sbhavani wants to merge 5 commits intoNVIDIA:mainfrom
sbhavani:fix/readme-updates
Open

docs(readme): update convergence table, latest news, and outdated links#2638
sbhavani wants to merge 5 commits intoNVIDIA:mainfrom
sbhavani:fix/readme-updates

Conversation

@sbhavani
Copy link
Collaborator

@sbhavani sbhavani commented Feb 1, 2026

Description

Updates README add missing format support documentation, update news section, and fix broken/outdated links.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

  • Add MXFP8 and NVFP4 format support to highlights and description
  • Update FP8 convergence table with MXFP8 results from arxiv paper
  • Remove outdated JAX Toolbox links and "available on request" entries
  • Update Docker container versions to 26.01
  • Fix DeepSpeed and Lightning integration links
  • Add Nemotron 3 paper to Latest News
  • Add quickstart notebook link after PyTorch example

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 1, 2026

Greptile Summary

This PR updates README.rst with documentation improvements: adding a Nemotron 3 news entry, documenting MXFP8 and NVFP4 format support in the description and highlights, refreshing the FP8 convergence table with Megatron Core MXFP8 results (removing stale JAX Toolbox and "available on request" entries), updating the NGC container version to 26.01, and fixing integration links (DeepSpeed, Lightning) while removing "Coming soon" stubs for Colossal-AI and PeriFlow.

Key observations:

  • The quickstart.ipynb linked at line 143 (docs/examples/quickstart.ipynb) does not exist in the repository — only quickstart_utils.py and quickstart_jax_utils.py are present. This link was already broken in the base branch and is not introduced by this PR, but merging without addressing it keeps a broken link visible.
  • The "FP8 Convergence" section title and its introductory paragraph claim convergence only for FP8 vs BF16, but the PR description explicitly states the new LLM-8B and MoE-16B Megatron Core entries represent MXFP8 results from arxiv paper 2506.08027. The section heading and intro text are not updated to reflect this, which may mislead readers about what precision those entries demonstrate.
  • Removal of "Coming soon" entries for Colossal-AI and PeriFlow is unannounced — consider whether these integrations are now abandoned or simply no longer tracked.

Confidence Score: 4/5

  • This is a documentation-only PR with no code changes; safe to merge with minor doc accuracy concerns addressed.
  • All changes are documentation-only (README.rst). The link fixes, news entry, MXFP8/NVFP4 additions, and container version update are accurate and improvements. The main concern is the FP8 Convergence section's title and intro text not being updated to reflect the newly added MXFP8 results — this is a clarity issue rather than a factual error. The broken quickstart.ipynb link pre-exists in the base branch and was not introduced by this PR.
  • README.rst lines 379–382 (FP8 Convergence section title and intro paragraph should mention MXFP8 to match the newly added table entries)

Important Files Changed

Filename Overview
README.rst Documentation update with mostly good changes (news entry, link fixes, container version updates, removed "Coming soon" stubs), but includes a broken quickstart.ipynb link that pre-exists in the base and was not introduced by this PR, and the FP8 Convergence section's title and introductory text are not updated to reflect the addition of MXFP8 results from the cited arxiv paper.

Flowchart

flowchart TD
    A[README.rst Changes] --> B[Latest News Section]
    A --> C[What is TE? Description]
    A --> D[Highlights Section]
    A --> E[NGC Container Version]
    A --> F[FP8 Convergence Table]
    A --> G[Integrations List]

    B --> B1["+ Nemotron 3 paper (12/2025)\n NVFP4 on Transformer Engine"]

    C --> C1["+ MXFP8 / NVFP4 support\n mentioned for Blackwell GPUs"]

    D --> D1["+ MXFP8 and NVFP4 bullet\n for Blackwell GPUs"]

    E --> E1["Updated to 26.01\n (January 2026 release)"]

    F --> F2["- Removed: T5-770M (JAX/T5x)\n- Removed: GPT-5B (JAX/Paxml)\n- Removed: GPT-5B (NeMo, on-request)\n- Removed: T5-11B (JAX/T5x, on-request)\n- Removed: GPT-22B (NeMo, on-request)\n- Removed: GPT-175B (JAX/Paxml)"]
    F --> F3["+ Added: LLM-8B (Megatron Core)\n+ Added: MoE-16B (Megatron Core)"]

    G --> G1["DeepSpeed: specific test file → root repo\nLightning: GitHub issue → proper docs\n- Removed: Colossal-AI (Coming soon)\n- Removed: PeriFlow (Coming soon)\nNanotron: removed Coming soon tag"]
Loading

Last reviewed commit: f0bd464

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

README.rst Outdated
loss = out.sum()
loss.backward()

For a tutorial with more details, see the `Quickstart Notebook <https://github.com/NVIDIA/TransformerEngine/blob/main/docs/examples/quickstart.ipynb>`_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The referenced quickstart.ipynb file does not exist in docs/examples/. The actual notebooks in that directory are fp8_primer.ipynb, advanced_optimizations.ipynb, and te_jax_integration.ipynb. Consider using one of these existing notebooks or creating the quickstart notebook before merging.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

README.rst Outdated
Comment on lines 354 to 358
| LLM-8B | Megatron Core | https://arxiv.org/abs/2506.08027 |
+------------+------------------+---------------------------------------------------------------------------------------------------------+
| MPT-13B | Mosaic Composer | https://www.databricks.com/blog/turbocharged-training-optimizing-databricks-mosaic-ai-stack-fp8 |
+------------+------------------+---------------------------------------------------------------------------------------------------------+
| GPT-22B | NeMo Framework | Available on request |
| MoE-16B | Megatron Core | https://arxiv.org/abs/2506.08027 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra whitespace in "Megatron Core" framework names - inconsistent with other rows

Suggested change
| LLM-8B | Megatron Core | https://arxiv.org/abs/2506.08027 |
+------------+------------------+---------------------------------------------------------------------------------------------------------+
| MPT-13B | Mosaic Composer | https://www.databricks.com/blog/turbocharged-training-optimizing-databricks-mosaic-ai-stack-fp8 |
+------------+------------------+---------------------------------------------------------------------------------------------------------+
| GPT-22B | NeMo Framework | Available on request |
| MoE-16B | Megatron Core | https://arxiv.org/abs/2506.08027 |
| LLM-8B | Megatron Core | https://arxiv.org/abs/2506.08027 |
+------------+------------------+---------------------------------------------------------------------------------------------------------+
| MPT-13B | Mosaic Composer | https://www.databricks.com/blog/turbocharged-training-optimizing-databricks-mosaic-ai-stack-fp8 |
+------------+------------------+---------------------------------------------------------------------------------------------------------+
| MoE-16B | Megatron Core | https://arxiv.org/abs/2506.08027 |

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

| LLama2-7B | Alibaba Pai | https://mp.weixin.qq.com/s/NQT0uKXLbXyh5031zBdeBQ |
+------------+------------------+---------------------------------------------------------------------------------------------------------+
| T5-11B | JAX/T5x | Available on request |
| LLM-8B | Megatron Core | https://arxiv.org/abs/2506.08027 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra space before "Megatron Core" - should have 2 spaces like other rows, not 3

Suggested change
| LLM-8B | Megatron Core | https://arxiv.org/abs/2506.08027 |
| LLM-8B | Megatron Core | https://arxiv.org/abs/2506.08027 |

| MPT-13B | Mosaic Composer | https://www.databricks.com/blog/turbocharged-training-optimizing-databricks-mosaic-ai-stack-fp8 |
+------------+------------------+---------------------------------------------------------------------------------------------------------+
| GPT-22B | NeMo Framework | Available on request |
| MoE-16B | Megatron Core | https://arxiv.org/abs/2506.08027 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra space before "Megatron Core" - should have 2 spaces like other rows, not 3

Suggested change
| MoE-16B | Megatron Core | https://arxiv.org/abs/2506.08027 |
| MoE-16B | Megatron Core | https://arxiv.org/abs/2506.08027 |

…t info

- Add MXFP8 and NVFP4 format support to highlights and description
- Update FP8 convergence table with MXFP8 results from arxiv paper
- Remove outdated JAX-Toolbox links and "available on request" entries
- Update Docker container versions to 26.01
- Fix DeepSpeed and Lightning integration links
- Add Nemotron 3 paper to Latest News
- Add quickstart notebook link after PyTorch example

Signed-off-by: Santosh Bhavani <santosh.bhavani@live.com>
- Replace quickstart.ipynb link with fp8_primer.ipynb (file exists)
- Fix extra whitespace in Megatron Core table rows

Signed-off-by: Santosh Bhavani <santosh.bhavani@live.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

Signed-off-by: Santosh Bhavani <santosh.bhavani@live.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 10, 2026

Additional Comments (1)

README.rst
Broken reST link markup

This link uses ...__(double-underscore), while the rest of the README uses..._. In reStructuredText, __ denotes an anonymous hyperlink reference and can render incorrectly here.

* `Stable and Scalable FP8 Deep Learning Training on Blackwell | GTC 2025 <https://www.nvidia.com/en-us/on-demand/session/gtc24-s62457/>`_

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 17, 2026

Additional Comments (1)

README.rst
MXFP8 results not reflected in section title or intro

The PR description states the convergence table is being "updated with MXFP8 results from arxiv paper" (arxiv.org/abs/2506.08027), and entries for LLM-8B and MoE-16B via Megatron Core are cited from that paper. However, the section heading remains FP8 Convergence and the introductory paragraph only asserts convergence for FP8 vs BF16:

"FP8 has been tested extensively... no significant difference between FP8 and BF16 training loss curves."

If the new entries represent MXFP8 convergence (not standard FP8), readers may be misled about what precision those Megatron Core results actually demonstrate. Consider either:

  • Adding a sentence to the intro noting MXFP8 convergence has also been validated, or
  • Adding a "Format" column to the table to distinguish FP8 from MXFP8 entries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants