Skip to content

Fixing gelu_checkpointing memory issue#812

Merged
RezaYazdaniAminabadi merged 3 commits intomasterfrom
transformer/fix-gelu-checkpoint
Mar 3, 2021
Merged

Fixing gelu_checkpointing memory issue#812
RezaYazdaniAminabadi merged 3 commits intomasterfrom
transformer/fix-gelu-checkpoint

Conversation

@RezaYazdaniAminabadi
Copy link
Copy Markdown
Contributor

This PR solves the memory error when enabling gelu-checkpoint. I also checked this modification for the rest of memory optimization flags inside transformer layer, and they all passed the unit tests.

@owmohamm, can you please verify if this is working on your side?
Thanks

@owmohamm
Copy link
Copy Markdown

owmohamm commented Mar 3, 2021

Tested out the changes locally and are working fine. With my configuration which is analogous to Bert-Large memory savings are as follows

Flags Set Memory Used
None 30252MiB
gelu_checkpoint 27308MiB
normalize_invertible 27650MiB
normalize_invertible + gelu_checkpoint 23184MiB
normalize_invertible + gelu_checkpoint + attn_dropout_checkpoint 21686MiB

@RezaYazdaniAminabadi RezaYazdaniAminabadi merged commit 8295d7a into master Mar 3, 2021
sdtblck added a commit to EleutherAI/DeeperSpeed that referenced this pull request Mar 4, 2021
* fixing buffers in transformer kernel when gelu-checkpoint is enabled

* fixing the test issue for other memory optimization flags

* fixing a bug for when attn_dropout_checkpoint is enabled

Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
@mrwyattii mrwyattii deleted the transformer/fix-gelu-checkpoint branch July 7, 2023 02:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants