[fx] provide a stable but not accurate enough version of profiler. by super-dainiu · Pull Request #1547 · hpcaitech/ColossalAI

super-dainiu · 2022-09-06T08:24:50Z

What's new?

With MetaTensor, we can compute flops of any autograd procedure.

tm_models = [
    tm.vgg11, 
    tm.resnet18, 
    tm.densenet121, 
    tm.mobilenet_v3_small, 
    tm.resnext50_32x4d, 
    tm.wide_resnet50_2,
    tm.regnet_x_16gf, 
    tm.mnasnet0_5,
    tm.convnext_tiny,
    tm.efficientnet_b0,
    tm.vit_b_16,
]

for model in tm_models:
    input = torch.rand(4000, 3, 224, 224, device='meta')
    layer = model()
    print(_profile(layer.forward, (input, ), {})[1])

layer = torch.nn.Conv2d(3, 2, 5)
input = torch.rand(4000, 3, 224, 224, device='meta')
print(_profile(layer.forward, (input, ), {})[1])
===========================================================================
(30490748928000, 60927113120000)
(7321522176000, 14578522016000)
(11718887424000, 23091915680000)
(260871680000, 7640462784000)
(17287200768000, 62879690656000)
(45957365760000, 91549855648000)
(64395440128000, 238909743008000)
(490859008000, 12065158560000)
(17974926342624, 137366425504000)
(1713884400000, 63079273104000)
(70435619904000, 140725435360000)
(29040000000, 58080000000)

Combined with MetaInfoProp, every node will have the following attribute, which will facilitate research on act_ckpt with more specific information.

Node:
    flop_count (Tuple[int, ...]): The flop count for (fwd_flop, bwd_flop).
    mem_stat (Tuple[int, ...]): The memory statistics for (fwd_tmp, fwd_out, bwd_tmp, bwd_out)

And the MetaInfoProp results are almost accurate.

model	estimated_fwd_mem	estimated_param_mem	real_fwd_mem	real_param_mem	fwd_flop	bwd_flop
<function densenet121 at 0x7fc843351550>	550.279 MB	30.437 MB	537.939 MB	30.859 MB	fwd_flop=11.719GFLOPs	bwd_flop=23.055GFLOPs
<function densenet161 at 0x7fc843351700>	1071.932 MB	109.409 MB	1057.037 MB	111.325 MB	fwd_flop=31.627GFLOPs	bwd_flop=62.538GFLOPs
<function densenet169 at 0x7fc8433518b0>	678.152 MB	53.976 MB	666.501 MB	54.724 MB	fwd_flop=13.902GFLOPs	bwd_flop=27.341GFLOPs
<function densenet201 at 0x7fc843351a60>	876.703 MB	76.347 MB	866.970 MB	77.392 MB	fwd_flop=17.765GFLOPs	bwd_flop=34.930GFLOPs
<function convnext_tiny at 0x7fe7e2bf9310>	583.442 MB	109.059 MB	539.976 MB	109.942 MB	fwd_flop=17.975GFLOPs	bwd_flop=137.358GFLOPs
<function convnext_small at 0x7fe7e2bf94c0>	934.707 MB	191.588 MB	885.770 MB	192.682 MB	fwd_flop=34.974GFLOPs	bwd_flop=272.979GFLOPs
<function convnext_base at 0x7fe7e2bf9670>	1328.002 MB	337.950 MB	1233.178 MB	338.043 MB	fwd_flop=61.738GFLOPs	bwd_flop=484.819GFLOPs
<function convnext_large at 0x7fe7e2bf9820>	2238.342 MB	754.423 MB	2089.938 MB	755.756 MB	fwd_flop=137.925GFLOPs	bwd_flop=1089.767GFLOPs
<function vit_b_16 at 0x7f320112c5e0>	676.548 MB	330.229 MB	869.416 MB	330.229 MB	fwd_flop=70.436GFLOPs	bwd_flop=90.311GFLOPs
<function vit_b_32 at 0x7f320112c790>	426.163 MB	336.549 MB	460.042 MB	337.311 MB	fwd_flop=17.678GFLOPs	bwd_flop=23.616GFLOPs
<function vit_h_14 at 0x7f320112c430>	4366.081 MB	2411.063 MB	5599.554 MB	2475.842 MB	fwd_flop=670.212GFLOPs	bwd_flop=864.708GFLOPs
<function vit_l_16 at 0x7f320112c940>	2065.175 MB	1160.914 MB	2594.511 MB	1162.164 MB	fwd_flop=246.692GFLOPs	bwd_flop=318.905GFLOPs
<function vit_l_32 at 0x7f320112caf0>	1400.561 MB	1169.340 MB	1510.237 MB	1169.434 MB	fwd_flop=61.619GFLOPs	bwd_flop=81.867GFLOPs
<function gpt2_medium at 0x7f320007aaf0>	64452.645 MB	1353.543 MB	55721.340 MB	1377.555 MB	fwd_flop=3321.385GFLOPs	bwd_flop=6634.189GFLOPs

TODO

I skipped the test for checkpoint solvers because it should integrate new features.

Concerns

This profiler is still not accurate enough.

Tests

All tests passed with PyTorch 1.11 (CI) and Pytorch 1.12 (as below).

super-dainiu · 2022-09-06T08:42:42Z

+@register_meta(aten.hardtanh_backward.default)
+def meta_hardtanh_backward(grad_out: torch.Tensor, input: torch.Tensor, min_val: int, max_val: int):
+    grad_in = torch.empty_like(input)
+    return grad_in


only some extra registrations in this file.

super-dainiu · 2022-09-06T08:43:24Z

+    from . import _meta_registrations
+    META_COMPATIBILITY = True
 except:
    import torch
+    META_COMPATIBILITY = False


META_COMPATIBILITY is checked when Colossal-AI initializes.

super-dainiu · 2022-09-06T08:56:42Z

+        for param in self.module.parameters():
+            param.grad = None


Obviously, we need to clear grad of the parameter, because these grads are meta

super-dainiu · 2022-09-06T09:00:21Z

@@ -0,0 +1,125 @@
+from typing import Callable, Any, Dict, Tuple


This is the old one, so I did not modify anything except for the output format.

old for PyTorch 1.11

super-dainiu · 2022-09-06T09:06:49Z

+if META_COMPATIBILITY:
+    aten = torch.ops.aten
+
+    WEIRD_OPS = [
+        torch.where,
+    ]
+
+    INPLACE_ATEN = [
+        aten.add_.Tensor,
+        aten.add.Tensor,
+        aten.sub_.Tensor,
+        aten.div_.Tensor,
+        aten.div_.Scalar,
+        aten.mul_.Tensor,
+        aten.mul.Tensor,
+        aten.bernoulli_.float,
+
+    # inplace reshaping
+        aten.detach.default,
+        aten.t.default,
+        aten.transpose.int,
+        aten.view.default,
+        aten._unsafe_view.default,


These are created if we have META_COMPATIBILITY

super-dainiu and others added 16 commits September 1, 2022 18:04

[fx] compute memory stat and flop count for MetaInfoProp.

e0edb21

Merge branch 'hpcaitech:main' into feature/flop_tensor

579b70b

Merge branch 'hpcaitech:main' into feature/flop_tensor

7f3a532

[fx] modify node attribute.

17be5a5

[fx] modify ckpt_chen.

36c93ca

[fx] fix compatibility.

d6dcd80

[fx] fix import error.

01de3f1

[fx] skip test for MetaInfoProp.

e51f32a

[fx] skip test for MetaInfoProp.

a9193b5

[fx] skip test for MetaInfoProp.

97cfba9

[fx] skip test for MetaInfoProp.

c992b25

[fx] skip if torch 1.11.0.

62d7096

[fx] seek to solve incompatibilities.

b2e8f6a

[fx] recover MetaInfoProp support for PyTorch 1.11.

f80735f

[fx] provide a stable but not accurate enough version of profiler.

0d4a030

Merge branch 'hpcaitech:main' into feature/flop_tensor

e7a33a6

super-dainiu requested review from Cypher30, FrankLeeeee and YuliangLiu0306 September 6, 2022 08:25

super-dainiu added the Run Build and Test label Sep 6, 2022

super-dainiu added 2 commits September 6, 2022 16:31

[fx] provide a stable but not accurate enough version of profiler.

9223090

[fx] fix compatibility in tests.

1e49efe

super-dainiu commented Sep 6, 2022

View reviewed changes

super-dainiu added 2 commits September 6, 2022 16:44

[fx] fix compatibility in tests.

77045db

[fx] fix compatibility in tests.

1c127c6

super-dainiu commented Sep 6, 2022

View reviewed changes

super-dainiu added 2 commits September 6, 2022 16:58

[fx] fix compatibility in tests.

6f486d1

[fx] fix compatibility in tests.

c1eb532

super-dainiu commented Sep 6, 2022

View reviewed changes

[fx] fix compatibility in tests.

b3d144e

super-dainiu commented Sep 6, 2022

View reviewed changes

super-dainiu added 5 commits September 6, 2022 17:07

[fx] fix compatibility in tests.

15a1bb1

[fx] fix compatibility in tests.

d011148

[fx] fix compatibility in tests.

1a45a12

[fx] fix compatibility in tests.

cf06d84

[fx] fix import error.

359f750

FrankLeeeee approved these changes Sep 7, 2022

View reviewed changes

FrankLeeeee merged commit 4f59693 into hpcaitech:main Sep 7, 2022

super-dainiu deleted the feature/flop_tensor branch September 7, 2022 05:21

super-dainiu mentioned this pull request Sep 12, 2022

[fx] provide an accurate estimation of memory. #1587

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fx] provide a stable but not accurate enough version of profiler.#1547

[fx] provide a stable but not accurate enough version of profiler.#1547
FrankLeeeee merged 28 commits intohpcaitech:mainfrom
super-dainiu:feature/flop_tensor

super-dainiu commented Sep 6, 2022 •

edited

Loading

Uh oh!

super-dainiu Sep 6, 2022

Uh oh!

super-dainiu Sep 6, 2022

Uh oh!

super-dainiu Sep 6, 2022

Uh oh!

super-dainiu Sep 6, 2022

Uh oh!

super-dainiu Sep 6, 2022

Uh oh!

super-dainiu Sep 6, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,125 @@
		from typing import Callable, Any, Dict, Tuple

Conversation

super-dainiu commented Sep 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's new?

TODO

Concerns

Tests

Uh oh!

super-dainiu Sep 6, 2022

Choose a reason for hiding this comment

Uh oh!

super-dainiu Sep 6, 2022

Choose a reason for hiding this comment

Uh oh!

super-dainiu Sep 6, 2022

Choose a reason for hiding this comment

Uh oh!

super-dainiu Sep 6, 2022

Choose a reason for hiding this comment

Uh oh!

super-dainiu Sep 6, 2022

Choose a reason for hiding this comment

Uh oh!

super-dainiu Sep 6, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

super-dainiu commented Sep 6, 2022 •

edited

Loading