[utils] Add use_reetrant=False in utils.activation_checkpoint#1460
Merged
Cypher30 merged 21 commits intohpcaitech:mainfrom Aug 16, 2022
Merged
[utils] Add use_reetrant=False in utils.activation_checkpoint#1460Cypher30 merged 21 commits intohpcaitech:mainfrom
Cypher30 merged 21 commits intohpcaitech:mainfrom
Conversation
Merge ColossalAI
Daily merge
FrankLeeeee
reviewed
Aug 16, 2022
Comment on lines
+68
to
+69
| def test_activation_checkpointing_reentrant_False(cpu_offload): | ||
|
|
Contributor
There was a problem hiding this comment.
Add a reset_seed at the start of the function so that other tests will not affect this one.
Contributor
There was a problem hiding this comment.
This applies to the test function above as well.
Contributor
Author
There was a problem hiding this comment.
oooo! I will modify it
FrankLeeeee
reviewed
Aug 16, 2022
|
|
||
|
|
||
| @pytest.mark.gpu | ||
| @pytest.mark.parametrize("cpu_offload", [True, False]) |
Contributor
There was a problem hiding this comment.
If these two test functions only differ by the use_reentrant variable, you can just add parameterize here instead of creating a duplicated function.
Contributor
Author
There was a problem hiding this comment.
okay I will modify this
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
We encountered the situation that if the first operation in a checkpoint function is an in-place operation, it will raise error as in original checkpoint process, it calls
run_functionwith thedetached_input, which will be viewed as a leaf node withrequires_grad=True, the autograd itself will not allow this thing to happen. We check the torch itself has the option to set use_reentrant=False that could address this problem, using thetorch.autograd.graph.saved_tensors_hooksto avoid calling the re-computation withdetached_input. So I add this feature inside our colossalai checkpoint, and add our activation offload process foruse_reetrant=Falsecase as the original torch checkpoint doesn't provide this. I also modify the activation_checkpoint test which fail in previous PR [test] recovered activation checkpointig test #1459