Skip to content

Use arguments instead of env vars for TP comm overlap#649

Merged
timmoon10 merged 4 commits intoNVIDIA:mainfrom
minitu:jaeminc/tp_comm_knobs_pr
Feb 14, 2024
Merged

Use arguments instead of env vars for TP comm overlap#649
timmoon10 merged 4 commits intoNVIDIA:mainfrom
minitu:jaeminc/tp_comm_knobs_pr

Conversation

@minitu
Copy link
Contributor

@minitu minitu commented Jan 31, 2024

No description provided.

Comment on lines 196 to 207
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ub_tp_comm_overlap : bool, default = `False`
if set to `True`, enables overlap of TP communication with computation.
ub_bulk_wgrad : bool, default = `True`
ub_bulk_dgrad : bool, default = `True`
ub_split_ag : bool, default = `True`
enables split-pipelined overlap of allgather with computation.
ub_split_rs : bool, default = `True`
enables split-pipelined overlap of reduce-scatter with computation.
ub_atomic_gemm_ag: bool, default = `False`
if set to `True`, enables atomic overlap of allgather with computation.
ub_atomic_gemm_rs: bool, default = `False`
if set to `True`, enables atomic overlap of reduce-scatter with computation.

Copy link
Collaborator

@timmoon10 timmoon10 Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ksivaman Why did we want to remove this documentation? To remove clutter?

@ksivaman
Copy link
Member

Could you sign off your commits as per this guide? @minitu

@minitu minitu force-pushed the jaeminc/tp_comm_knobs_pr branch from 7d93d29 to d30d93b Compare January 31, 2024 21:54
@minitu minitu marked this pull request as ready for review February 5, 2024 21:25
@minitu minitu force-pushed the jaeminc/tp_comm_knobs_pr branch from e0f60a7 to 86fd020 Compare February 5, 2024 21:25
@timmoon10
Copy link
Collaborator

/te-ci pytorch

Copy link
Collaborator

@timmoon10 timmoon10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Jaemin Choi added 4 commits February 13, 2024 12:29
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
@minitu minitu force-pushed the jaeminc/tp_comm_knobs_pr branch from e991228 to 7c989ae Compare February 13, 2024 20:29
@minitu
Copy link
Contributor Author

minitu commented Feb 13, 2024

Rebased to ToT

@timmoon10 timmoon10 merged commit bdf1afe into NVIDIA:main Feb 14, 2024
layalir pushed a commit to layalir/TransformerEngine that referenced this pull request Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants