[autoparallel] modify construct chain in auto activation checkpoint solver rotor by Cypher30 · Pull Request #2254 · hpcaitech/ColossalAI

Cypher30 · 2023-01-02T02:34:26Z

What's New?

In this PR, I modify the _extract_node_info in _construct_chain of CheckpointSolverRotor, so that it could correctly compute the memory peak of forward phase.

The new forward peak is calculated with xbar. When we iterate over the list of torch.fx.Node, we first update xbar with buffer memory (e.g. running mean and running var in batch normalization) and calculate_fwd_out() function that computes the true output, i.e. the output used by users of the node. Then we update memory peak static with max(fwd_mem_peak, xbar + n.meta['fwd_mem_tmp'] + cls._extract_unused_output(n)), where cis._extract_unused_output() is the cis._extract_ftmp() in the past, but I modify it to focus on a single torch.fx.Node other than the last one in the list of torch.fx.Node, so that it could be aware of the discarded output of every torch.fx.Node inside the linearized 'node'.

Merge ColossalAI

Daily merge

Merge

Daily Merge

Cypher30 and others added 30 commits July 14, 2022 16:07

Merge pull request #1 from hpcaitech/main

04e5272

Merge ColossalAI

Merge pull request #2 from hpcaitech/main

75618b3

Daily merge

Merge pull request #3 from hpcaitech/main

3e4620c

Merge

Merge remote-tracking branch 'upstream/main' into main

cf24049

Merge

Merge remote-tracking branch 'upstream/main' into main

3d223b6

Daily Merge

Merge branch 'hpcaitech:main' into main

644115c

Merge branch 'hpcaitech:main' into main

d995ade

Merge branch 'hpcaitech:main' into main

bba2dbe

Merge branch 'hpcaitech:main' into main

05ca628

Merge branch 'hpcaitech:main' into main

0a967da

Merge branch 'hpcaitech:main' into main

0637c0d

Merge branch 'hpcaitech:main' into main

74a6227

Merge branch 'hpcaitech:main' into main

e550490

Merge branch 'hpcaitech:main' into main

2d7f5d9

Merge branch 'hpcaitech:main' into main

b62e870

Merge branch 'hpcaitech:main' into main

b4b0974

Merge branch 'hpcaitech:main' into main

65c20de

Merge branch 'hpcaitech:main' into main

1660bfc

Merge branch 'hpcaitech:main' into main

6eb0ad0

Merge branch 'hpcaitech:main' into main

56df059

Merge branch 'hpcaitech:main' into main

480e932

Merge branch 'hpcaitech:main' into main

0fa66ee

Merge branch 'hpcaitech:main' into main

1d013b0

Merge branch 'hpcaitech:main' into main

5774db2

Merge branch 'hpcaitech:main' into main

e8ff699

Merge branch 'hpcaitech:main' into main

855c728

Merge branch 'main' of github.com:Cypher30/ColossalAI into main

2c113ea

Merge branch 'hpcaitech:main' into main

838ba70

Merge branch 'main' of github.com:Cypher30/ColossalAI into main

cacec2b

Merge branch 'hpcaitech:main' into main

5ed6ef0

Cypher30 and others added 23 commits September 14, 2022 15:57

Merge branch 'hpcaitech:main' into main

668af30

Merge branch 'hpcaitech:main' into main

df79772

Merge branch 'hpcaitech:main' into main

7b6a0fc

Merge branch 'hpcaitech:main' into main

c30022e

Merge branch 'hpcaitech:main' into main

df20f4d

Merge branch 'hpcaitech:main' into main

2d5a6a0

Merge branch 'hpcaitech:main' into main

07d27a6

Merge branch 'hpcaitech:main' into main

dc68ba9

Merge branch 'hpcaitech:main' into main

929e7d3

Merge branch 'hpcaitech:main' into main

90aa46a

Merge branch 'hpcaitech:main' into main

40363da

Merge branch 'hpcaitech:main' into main

fe3fca5

Merge branch 'hpcaitech:main' into main

956156e

Merge branch 'hpcaitech:main' into main

cb20212

Merge branch 'hpcaitech:main' into main

744a775

Merge branch 'hpcaitech:main' into main

1629a90

Merge branch 'hpcaitech:main' into main

f0558e3

Merge branch 'hpcaitech:main' into main

bb7bd4a

Merge branch 'hpcaitech:main' into main

26de8e5

Merge branch 'hpcaitech:main' into main

83a1418

Merge branch 'hpcaitech:main' into main

0802b94

[autoparallel] modify construct chain in rotor solver

0631819

Merge branch 'hpcaitech:main' into hotfix/fix_construct_chain

f02d400

Cypher30 requested a review from super-dainiu January 2, 2023 02:34

super-dainiu changed the base branch from main to debug/ckpt-autoparallel January 2, 2023 08:25

super-dainiu merged commit ac37399 into hpcaitech:debug/ckpt-autoparallel Jan 2, 2023

super-dainiu mentioned this pull request Jan 3, 2023

[autockpt] provide option for activation checkpoint search in SPMD solver #2258

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[autoparallel] modify construct chain in auto activation checkpoint solver rotor#2254

[autoparallel] modify construct chain in auto activation checkpoint solver rotor#2254
super-dainiu merged 53 commits intohpcaitech:debug/ckpt-autoparallelfrom
Cypher30:hotfix/fix_construct_chain

Cypher30 commented Jan 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Cypher30 commented Jan 2, 2023

What's New?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants