Skip to content

Async amax reduction#118

Merged
ksivaman merged 9 commits intoNVIDIA:mainfrom
erhoo82:slym/async_amax_reduction
Apr 5, 2023
Merged

Async amax reduction#118
ksivaman merged 9 commits intoNVIDIA:mainfrom
erhoo82:slym/async_amax_reduction

Conversation

@erhoo82
Copy link
Collaborator

@erhoo82 erhoo82 commented Mar 24, 2023

  • Support async AMAX reduction
  • AMAX reduction is default enabled and can be disabled by setting an env var (NVTE_ASYNC_AMAX_REDUCTION)

@ptrendx
Copy link
Member

ptrendx commented Mar 27, 2023

@erhoo82 Please sign your commits (please refer to CONTRIBUTING.rst to see how to do this).

add env knob to enable async amax reduction

Signed-off-by: slym <slym@login-preos01.a51.clusters.nvidia.com>
@erhoo82 erhoo82 force-pushed the slym/async_amax_reduction branch from 09ad800 to 5e5ec4c Compare March 27, 2023 21:10
@erhoo82
Copy link
Collaborator Author

erhoo82 commented Mar 27, 2023

Thanks, signed the commit.

@ksivaman ksivaman self-requested a review March 27, 2023 23:16
@timmoon10
Copy link
Collaborator

/te-ci

Signed-off-by: Tim Moon <tmoon@nvidia.com>
@timmoon10
Copy link
Collaborator

/te-ci

Signed-off-by: slym <slym@login-preos01.a51.clusters.nvidia.com>
@erhoo82 erhoo82 force-pushed the slym/async_amax_reduction branch from 56f25a4 to 2699fb9 Compare April 3, 2023 23:25
Signed-off-by: slym <slym@login-preos01.a51.clusters.nvidia.com>
@erhoo82
Copy link
Collaborator Author

erhoo82 commented Apr 4, 2023

@ksivaman
I tested and confirmed the functionality.

ksivaman added 3 commits April 3, 2023 23:19
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Copy link
Member

@ksivaman ksivaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@ksivaman
Copy link
Member

ksivaman commented Apr 4, 2023

/te-ci

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
@ksivaman
Copy link
Member

ksivaman commented Apr 4, 2023

/te-ci

1 similar comment
@ksivaman
Copy link
Member

ksivaman commented Apr 4, 2023

/te-ci

@ptrendx ptrendx mentioned this pull request Apr 4, 2023
@ksivaman ksivaman merged commit db95afe into NVIDIA:main Apr 5, 2023
@Victarry
Copy link
Contributor

Hello, @ksivaman @erhoo82 , I think it's more reasonable to put amax_reduce_handle_fwd.wait() before FP8GlobalStateManager.copy_amax_from_global_buffer(self.fp8_meta, forward=True), which ensures the reduction is finished before copy amax from global buffer.

# Previous iteration was grad_enabled
if self.fp8_meta.get("update_amax_and_scale_fwd", False):
if self.fp8_meta["recipe"].reduce_amax:
FP8GlobalStateManager.copy_amax_from_global_buffer(self.fp8_meta, forward=True)
amax_and_scale_update(
self.fp8_meta, True, update_weight_scale_inv=update_weight_scale_inv
)
FP8GlobalStateManager.set_amax_buffer_key_deletion(self.fp8_meta, forward=True)
else:
amax_and_scale_update(
self.fp8_meta, True, update_weight_scale_inv=update_weight_scale_inv
)
if self.fp8 and self.training:
# Setup for amax reduction
if self.fp8_meta["recipe"].reduce_amax:
self.fp8_meta["first_module"] = FP8GlobalStateManager.is_first_fp8_module()
if self.fp8_meta["first_module"]:
# Wait for the prior AMAX reduction to finish
amax_reduce_handle_fwd = FP8GlobalStateManager.get_amax_reduce_handle_fwd()
if amax_reduce_handle_fwd is not None:
amax_reduce_handle_fwd.wait()
self.fp8_meta["autocast_id_fwd"] = (
FP8GlobalStateManager.new_fp8_context_id())
FP8GlobalStateManager.set_fp8_context_id(self.fp8_meta["autocast_id_fwd"])
else:
self.fp8_meta["autocast_id_fwd"] = (
FP8GlobalStateManager.get_fp8_context_id())
self.fp8_meta["autocast_id_fwd_stack"].append(
self.fp8_meta["autocast_id_fwd"]
)
FP8GlobalStateManager.add_amax_to_global_buffer(self.fp8_meta, forward=True)
self.fp8_meta["update_amax_and_scale_fwd"] = True
else:
self.fp8_meta["update_amax_and_scale_fwd"] = False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

Comments