Skip to content

Use FP8 in GPT TP2 test#9451

Merged
layalir merged 19 commits intomainfrom
jbaczek/use_fp8_in_func_tests
Jul 9, 2024
Merged

Use FP8 in GPT TP2 test#9451
layalir merged 19 commits intomainfrom
jbaczek/use_fp8_in_func_tests

Conversation

@jbaczek
Copy link
Collaborator

@jbaczek jbaczek commented Jun 12, 2024

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
@jbaczek jbaczek force-pushed the jbaczek/use_fp8_in_func_tests branch from 13888d7 to 3e30eb3 Compare June 12, 2024 12:20
@ko3n1g ko3n1g added Run CICD and removed Run CICD labels Jun 12, 2024
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
@jbaczek jbaczek force-pushed the jbaczek/use_fp8_in_func_tests branch from 482d843 to c6ccde4 Compare June 12, 2024 12:42
@jbaczek jbaczek added Run CICD and removed Run CICD labels Jun 12, 2024
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
@jbaczek jbaczek added Run CICD and removed Run CICD labels Jun 12, 2024
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
@jbaczek jbaczek removed the Run CICD label Jun 12, 2024
@github-actions github-actions bot added the NLP label Jun 12, 2024
@jbaczek jbaczek added Run CICD and removed NLP labels Jun 12, 2024
Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>
@github-actions github-actions bot added the NLP label Jun 12, 2024
jbaczek and others added 2 commits June 13, 2024 11:54
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>
@jbaczek jbaczek added Run CICD and removed Run CICD labels Jun 13, 2024
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
@jbaczek jbaczek added Run CICD and removed Run CICD labels Jun 13, 2024
Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>
@jbaczek jbaczek added Run CICD and removed Run CICD labels Jul 4, 2024
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
@jbaczek jbaczek added Run CICD and removed Run CICD labels Jul 5, 2024
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
@jbaczek jbaczek added Run CICD and removed Run CICD labels Jul 8, 2024
@jbaczek jbaczek changed the title [Draft] Use FP8 in GPT TP2 test Use FP8 in GPT TP2 test Jul 8, 2024
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
@jbaczek jbaczek added Run CICD and removed Run CICD labels Jul 9, 2024
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
@jbaczek jbaczek added Run CICD and removed Run CICD labels Jul 9, 2024
@pablo-garay pablo-garay requested a review from ericharper July 9, 2024 20:51
@layalir layalir merged commit 2ee8646 into main Jul 9, 2024
@layalir layalir deleted the jbaczek/use_fp8_in_func_tests branch July 9, 2024 21:26
marcromeyn pushed a commit that referenced this pull request Jul 11, 2024
* Use FP8 in GPT TP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add hydra options to use TE, TP overlap and FP8

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Override presence checks in hydra

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* WIP: Add debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Add more debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Add more debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Remove debug code and change underlying transformer layer to TE

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Override hydra error

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove tp overlap from the test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change runner for fp8 tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* fix

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add tp overlap test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove TP overlap from tests. It is unsupported in docker environment

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Adjust GPT PP2 test to use FP8. Change optimizer in TP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove env overrides form GPT PP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>
Co-authored-by: jbaczek <jbaczek@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
marcromeyn pushed a commit that referenced this pull request Jul 11, 2024
* Use FP8 in GPT TP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add hydra options to use TE, TP overlap and FP8

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Override presence checks in hydra

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* WIP: Add debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Add more debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Add more debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Remove debug code and change underlying transformer layer to TE

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Override hydra error

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove tp overlap from the test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change runner for fp8 tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* fix

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add tp overlap test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove TP overlap from tests. It is unsupported in docker environment

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Adjust GPT PP2 test to use FP8. Change optimizer in TP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove env overrides form GPT PP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>
Co-authored-by: jbaczek <jbaczek@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
maanug-nv pushed a commit that referenced this pull request Jul 14, 2024
* Use FP8 in GPT TP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add hydra options to use TE, TP overlap and FP8

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Override presence checks in hydra

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* WIP: Add debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Add more debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Add more debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Remove debug code and change underlying transformer layer to TE

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Override hydra error

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove tp overlap from the test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change runner for fp8 tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* fix

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add tp overlap test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove TP overlap from tests. It is unsupported in docker environment

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Adjust GPT PP2 test to use FP8. Change optimizer in TP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove env overrides form GPT PP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>
Co-authored-by: jbaczek <jbaczek@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
ashors1 pushed a commit that referenced this pull request Jul 18, 2024
* Use FP8 in GPT TP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add hydra options to use TE, TP overlap and FP8

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Override presence checks in hydra

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* WIP: Add debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Add more debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Add more debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Remove debug code and change underlying transformer layer to TE

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Override hydra error

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove tp overlap from the test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change runner for fp8 tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* fix

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add tp overlap test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove TP overlap from tests. It is unsupported in docker environment

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Adjust GPT PP2 test to use FP8. Change optimizer in TP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove env overrides form GPT PP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>
Co-authored-by: jbaczek <jbaczek@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
ertkonuk pushed a commit that referenced this pull request Jul 19, 2024
* Use FP8 in GPT TP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add hydra options to use TE, TP overlap and FP8

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Override presence checks in hydra

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* WIP: Add debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Add more debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Add more debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Remove debug code and change underlying transformer layer to TE

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Override hydra error

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove tp overlap from the test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change runner for fp8 tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* fix

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add tp overlap test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove TP overlap from tests. It is unsupported in docker environment

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Adjust GPT PP2 test to use FP8. Change optimizer in TP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove env overrides form GPT PP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>
Co-authored-by: jbaczek <jbaczek@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com>
tonyjie pushed a commit to tonyjie/NeMo that referenced this pull request Aug 6, 2024
* Use FP8 in GPT TP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add hydra options to use TE, TP overlap and FP8

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Override presence checks in hydra

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* WIP: Add debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Add more debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Add more debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Remove debug code and change underlying transformer layer to TE

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Override hydra error

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove tp overlap from the test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change runner for fp8 tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* fix

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add tp overlap test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove TP overlap from tests. It is unsupported in docker environment

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Adjust GPT PP2 test to use FP8. Change optimizer in TP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove env overrides form GPT PP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>
Co-authored-by: jbaczek <jbaczek@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Signed-off-by: tonyjie <jl4257@cornell.edu>
hainan-xv pushed a commit to hainan-xv/NeMo that referenced this pull request Nov 5, 2024
* Use FP8 in GPT TP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add hydra options to use TE, TP overlap and FP8

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Override presence checks in hydra

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* WIP: Add debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Add more debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Add more debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Remove debug code and change underlying transformer layer to TE

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Override hydra error

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove tp overlap from the test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change runner for fp8 tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* fix

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add tp overlap test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove TP overlap from tests. It is unsupported in docker environment

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Adjust GPT PP2 test to use FP8. Change optimizer in TP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove env overrides form GPT PP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>
Co-authored-by: jbaczek <jbaczek@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
XuesongYang pushed a commit to paarthneekhara/NeMo that referenced this pull request Jan 18, 2025
* Use FP8 in GPT TP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add hydra options to use TE, TP overlap and FP8

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Override presence checks in hydra

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* WIP: Add debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Add more debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Add more debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Remove debug code and change underlying transformer layer to TE

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Override hydra error

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove tp overlap from the test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change runner for fp8 tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* fix

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add tp overlap test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove TP overlap from tests. It is unsupported in docker environment

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Adjust GPT PP2 test to use FP8. Change optimizer in TP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove env overrides form GPT PP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>
Co-authored-by: jbaczek <jbaczek@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants