Skip to content

[autoparallel] Hook all meta information on ResNet nodes for auto activation checkpoint#2248

Merged
super-dainiu merged 57 commits intohpcaitech:debug/ckpt-autoparallelfrom
Cypher30:feature/spmd_activation_combination
Jan 2, 2023
Merged

[autoparallel] Hook all meta information on ResNet nodes for auto activation checkpoint#2248
super-dainiu merged 57 commits intohpcaitech:debug/ckpt-autoparallelfrom
Cypher30:feature/spmd_activation_combination

Conversation

@Cypher30
Copy link
Copy Markdown
Contributor

@Cypher30 Cypher30 commented Dec 31, 2022

What's New?

In this PR, I manage to hook all meta information on ResNet nodes so that the graph module could be prepared for the auto-activation checkpoint solver. I introduce a new pass comm_metainfo_pass to process the meta information of communication nodes such as runtime_apply and runtime_comm_spec_apply. I also polish some code in the meta-information generator and fix some small bugs along the way.

NOTE:
In this PR, I don't modify some node handlers to turn them into MetaInfoNodeHandler or MetaInfoModuleHandler as some of the operations haven't been patched, I will work on it in the next year (today is the last day of the year 2022 lol). I did the test locally and verified that the meta-information of ResNet could be successfully hooked on nodes.

Last PR for 2022, wish you guys all the best and have a lovely year in 2023.

Cypher30 and others added 30 commits July 14, 2022 16:07
@super-dainiu super-dainiu changed the base branch from main to debug/ckpt-autoparallel January 2, 2023 08:24
@super-dainiu super-dainiu merged commit ab38aeb into hpcaitech:debug/ckpt-autoparallel Jan 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants