Skip to content

[fx] add more op patches for profiler and error message for unsupported ops.#1495

Merged
super-dainiu merged 52 commits intohpcaitech:mainfrom
super-dainiu:feature/profiler
Aug 25, 2022
Merged

[fx] add more op patches for profiler and error message for unsupported ops.#1495
super-dainiu merged 52 commits intohpcaitech:mainfrom
super-dainiu:feature/profiler

Conversation

@super-dainiu
Copy link
Copy Markdown
Contributor

What's new?

Now our profiler can profile any model from torchvision library. I also added error message and guide for unsupported ops.

>> model = tm.efficientnet_b0()
>> data = torch.rand(1, 3, 224, 224, device='meta')
>> gm = symbolic_trace(model)
>> MetaInfoProp(gm).run(data)

AssertionError: 
Colossal-AI hasn't supported profiling for <class 'torch.nn.modules.activation.SiLU'>, you might manually patch it with the following code.

from colossalai.fx.profiler import meta_profiler_module

@meta_profiler_module.register(YOUR_MODULE)
def profile_YOUR_MODULE(self: torch.nn.Module, input: torch.Tensor) -> Tuple[int, int]:
    flops = ...
    macs = ...
    return flops, macs

Concerns

This profiler is not able to detect temporary memory cost within one node. For example, nn.Conv2d() has extra memory reserved for the allocator, and the amount varies according to different types of devices (https://dl.acm.org/doi/10.1145/3368089.3417050). Also, I did not consider the extra memory cost when computing nn.MultiheadAttention. Some problems can be solved by refractoring, but we have to withstand the inaccuracies.

super-dainiu and others added 30 commits August 9, 2022 23:23
* [fx] activation checkpointing using Chen strategies.

* [fx] add test for ckpt_solver_chen

* [fx] add vanilla activation checkpoint search with test on resnet and densenet

* [fx] add a namespace code for solver_chen.

* [fx] fix the false interpretation of algorithm 3 in https://arxiv.org/abs/1604.06174.

* [fx] fix lowercase naming conventions.

* [fx] simplify test for ckpt.
* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages

* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages

* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages

* [fx] merge development into main (#1)

* [fx] activation checkpointing using Chen strategies.

* [fx] add test for ckpt_solver_chen

* [fx] add vanilla activation checkpoint search with test on resnet and densenet

* [fx] add a namespace code for solver_chen.

* [fx] fix the false interpretation of algorithm 3 in https://arxiv.org/abs/1604.06174.

* [fx] fix lowercase naming conventions.

* [fx] simplify test for ckpt.

* [fx] fix test and algorithm bugs in activation checkpointing.

* [fx] polish ckpt_test.

* [fx] add rules to linearize computation graphs for searching.
@super-dainiu super-dainiu requested a review from Cypher30 August 25, 2022 09:30
@Cypher30 Cypher30 requested a review from FrankLeeeee August 25, 2022 11:58
Copy link
Copy Markdown
Contributor

@Cypher30 Cypher30 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Work!

@super-dainiu super-dainiu merged commit 09c023b into hpcaitech:main Aug 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants