Async amax reduction by erhoo82 · Pull Request #118 · NVIDIA/TransformerEngine

erhoo82 · 2023-03-24T21:12:09Z

Support async AMAX reduction
AMAX reduction is default enabled and can be disabled by setting an env var (NVTE_ASYNC_AMAX_REDUCTION)

ptrendx · 2023-03-27T20:31:15Z

@erhoo82 Please sign your commits (please refer to CONTRIBUTING.rst to see how to do this).

add env knob to enable async amax reduction Signed-off-by: slym <slym@login-preos01.a51.clusters.nvidia.com>

erhoo82 · 2023-03-27T21:11:47Z

Thanks, signed the commit.

transformer_engine/pytorch/module.py

timmoon10 · 2023-03-31T16:40:26Z

/te-ci

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2023-03-31T17:00:43Z

/te-ci

transformer_engine/pytorch/fp8.py

Signed-off-by: slym <slym@login-preos01.a51.clusters.nvidia.com>

erhoo82 · 2023-04-04T04:11:12Z

@ksivaman
I tested and confirmed the functionality.

transformer_engine/pytorch/fp8.py

transformer_engine/pytorch/module.py

transformer_engine/pytorch/fp8.py

transformer_engine/pytorch/module.py

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

ksivaman

LGTM, thanks!

ksivaman · 2023-04-04T06:36:31Z

/te-ci

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

ksivaman · 2023-04-04T17:16:05Z

/te-ci

ksivaman · 2023-04-04T17:25:06Z

/te-ci

Victarry · 2023-08-30T12:22:53Z

Hello, @ksivaman @erhoo82 , I think it's more reasonable to put amax_reduce_handle_fwd.wait() before FP8GlobalStateManager.copy_amax_from_global_buffer(self.fp8_meta, forward=True), which ensures the reduction is finished before copy amax from global buffer.

TransformerEngine/transformer_engine/pytorch/module/base.py

Lines 582 to 616 in b8ba734

    
           # Previous iteration was grad_enabled 
        
           if self.fp8_meta.get("update_amax_and_scale_fwd", False): 
        
               if self.fp8_meta["recipe"].reduce_amax: 
        
                   FP8GlobalStateManager.copy_amax_from_global_buffer(self.fp8_meta, forward=True) 
        
                   amax_and_scale_update( 
        
                       self.fp8_meta, True, update_weight_scale_inv=update_weight_scale_inv 
        
                   ) 
        
                   FP8GlobalStateManager.set_amax_buffer_key_deletion(self.fp8_meta, forward=True) 
        
               else: 
        
                   amax_and_scale_update( 
        
                       self.fp8_meta, True, update_weight_scale_inv=update_weight_scale_inv 
        
                   ) 
        
           if self.fp8 and self.training: 
        
               # Setup for amax reduction 
        
               if self.fp8_meta["recipe"].reduce_amax: 
        
                   self.fp8_meta["first_module"] = FP8GlobalStateManager.is_first_fp8_module() 
        
                   if self.fp8_meta["first_module"]: 
        
                       # Wait for the prior AMAX reduction to finish 
        
                       amax_reduce_handle_fwd = FP8GlobalStateManager.get_amax_reduce_handle_fwd() 
        
                       if amax_reduce_handle_fwd is not None: 
        
                           amax_reduce_handle_fwd.wait() 
        
                       self.fp8_meta["autocast_id_fwd"] = ( 
        
                           FP8GlobalStateManager.new_fp8_context_id()) 
        
                       FP8GlobalStateManager.set_fp8_context_id(self.fp8_meta["autocast_id_fwd"]) 
        
                   else: 
        
                       self.fp8_meta["autocast_id_fwd"] = ( 
        
                           FP8GlobalStateManager.get_fp8_context_id()) 
        
                   self.fp8_meta["autocast_id_fwd_stack"].append( 
        
                       self.fp8_meta["autocast_id_fwd"] 
        
                   ) 
        
                   FP8GlobalStateManager.add_amax_to_global_buffer(self.fp8_meta, forward=True) 
        
               self.fp8_meta["update_amax_and_scale_fwd"] = True 
        
           else: 
        
               self.fp8_meta["update_amax_and_scale_fwd"] = False

async amax reduction

5e5ec4c

add env knob to enable async amax reduction Signed-off-by: slym <slym@login-preos01.a51.clusters.nvidia.com>

erhoo82 force-pushed the slym/async_amax_reduction branch from 09ad800 to 5e5ec4c Compare March 27, 2023 21:10

ksivaman self-requested a review March 27, 2023 23:16

timmoon10 reviewed Mar 28, 2023

View reviewed changes

transformer_engine/pytorch/module.py Show resolved Hide resolved

Style fixes

c8a615f

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Merge branch 'main' into slym/async_amax_reduction

16b7a28

ksivaman reviewed Apr 3, 2023

View reviewed changes

transformer_engine/pytorch/fp8.py Outdated Show resolved Hide resolved

ksivaman reviewed Apr 3, 2023

View reviewed changes

transformer_engine/pytorch/fp8.py Outdated Show resolved Hide resolved

remove is_last_model

2699fb9

Signed-off-by: slym <slym@login-preos01.a51.clusters.nvidia.com>

erhoo82 force-pushed the slym/async_amax_reduction branch from 56f25a4 to 2699fb9 Compare April 3, 2023 23:25

fix naming

0bef828

Signed-off-by: slym <slym@login-preos01.a51.clusters.nvidia.com>

ksivaman reviewed Apr 4, 2023

View reviewed changes

transformer_engine/pytorch/fp8.py Outdated Show resolved Hide resolved

ksivaman reviewed Apr 4, 2023

View reviewed changes

transformer_engine/pytorch/module.py Outdated Show resolved Hide resolved

ksivaman reviewed Apr 4, 2023

View reviewed changes

transformer_engine/pytorch/fp8.py Outdated Show resolved Hide resolved

ksivaman reviewed Apr 4, 2023

View reviewed changes

transformer_engine/pytorch/module.py Outdated Show resolved Hide resolved

ksivaman added 3 commits April 3, 2023 23:19

revert var name

c0b02a6

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

revert var name

ab086cc

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

Merge branch 'main' into slym/async_amax_reduction

3b59d77

ksivaman approved these changes Apr 4, 2023

View reviewed changes

Merge branch 'main' into slym/async_amax_reduction

4bddf67

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

ptrendx mentioned this pull request Apr 4, 2023

FP8 questions... #131

Closed

ksivaman merged commit db95afe into NVIDIA:main Apr 5, 2023

erhoo82 deleted the slym/async_amax_reduction branch January 6, 2024 09:33

timmoon10 mentioned this pull request Mar 21, 2024

[PyTorch] Debug amax reductions in eval mode and async amax reductions #728

Closed

Conversation

erhoo82 commented Mar 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ptrendx commented Mar 27, 2023

Uh oh!

erhoo82 commented Mar 27, 2023

Uh oh!

Uh oh!

timmoon10 commented Mar 31, 2023

Uh oh!

timmoon10 commented Mar 31, 2023

Uh oh!

Uh oh!

Uh oh!

erhoo82 commented Apr 4, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ksivaman left a comment

Choose a reason for hiding this comment

Uh oh!

ksivaman commented Apr 4, 2023

Uh oh!

ksivaman commented Apr 4, 2023

Uh oh!

ksivaman commented Apr 4, 2023

Uh oh!

Victarry commented Aug 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Comments

erhoo82 commented Mar 24, 2023 •

edited

Loading