[Collage] PruneCandidates and demo_collage_partition.py #12105

mbs-octoml · 2022-07-14T23:30:12Z

See https://github.com/apache/tvm-rfcs/blob/main/rfcs/0062-collage.md.

This completes our checkin of our Collage 'sketch' branch into main. Special thanks
to Matthew Barrett for his help getting this over the line.

The only C++ functionality added here is for 'pruning' candidates. This is a somewhat
speculative algorithm (and I've called that out in the comments) which tries to
elide candidate partitions which will 'obviously' not contribute to the final optimal
partitioning. For largish models such as GPT2 this can significantly reduce the number of
candidates we need to actually measure latency on. I beefed up the MockCostEstimator to
make it possible to assert pruning occured from within the test_pass_collage_partition.py
unit test.

The rest of this PR adds the demo_collage_partition.py driver file we've been using
to test and measure perfomance differences against various baseline (though only
for the CUDA ecosystem). To eliminate loading time the models of interest are directly
expressed in Relay text form in menangerie.py.

See https://github.com/apache/tvm-rfcs/blob/main/rfcs/0062-collage.md. This completes our checkin of our Collage 'sketch' branch into main. Special thanks to Matthew Barrett for his help getting this over the line. The only C++ functionality added here is for 'pruning' candidates. This is a somewhat speculative algorithm (and I've called that out in the comments) which tries to elide candidate partitions which will 'obviously' not contribute to the final optimal partitioning. For largish models such as GPT2 this can significantly reduce the number of candidates we need to actually measure latency on. I beefed up the MockCostEstimator to make it possible to assert pruning occured from within the test_pass_collage_partition.py unit test. The rest of this PR adds the demo_collage_partition.py driver file we've been using to test and measure perfomance differences against various baseline (though only for the CUDA ecosystem). To eliminate loading time the models of interest are directly expressed in Relay text form in menangerie.py.

jwfromm · 2022-07-15T17:49:57Z

tests/python/relay/collage/demo_collage_partitioner.py

+
+# CAUTION: Requires some changes in python/tvm/autotvm/task/dispatcher.py
+# so that AutoTVM tuning records can be cached between runs and between
+# models. See https://github.com/mbs-octoml/mbs-tvm/tree/mbs-collage-hacks.


Just noting for posterity, these hacks are needed because autotvm isnt properly caching results? Does that lead to much longer tuning times than necessary or some other breakage?

It's so that the autotvm tuning helpers in demo_collage_partition.py can use the existing tuning records as a cache which can be shared overall all models. Ie a poor man's TRS.

* [Collage] PruneCandidates and demo_collage_partition.py See https://github.com/apache/tvm-rfcs/blob/main/rfcs/0062-collage.md. This completes our checkin of our Collage 'sketch' branch into main. Special thanks to Matthew Barrett for his help getting this over the line. The only C++ functionality added here is for 'pruning' candidates. This is a somewhat speculative algorithm (and I've called that out in the comments) which tries to elide candidate partitions which will 'obviously' not contribute to the final optimal partitioning. For largish models such as GPT2 this can significantly reduce the number of candidates we need to actually measure latency on. I beefed up the MockCostEstimator to make it possible to assert pruning occured from within the test_pass_collage_partition.py unit test. The rest of this PR adds the demo_collage_partition.py driver file we've been using to test and measure perfomance differences against various baseline (though only for the CUDA ecosystem). To eliminate loading time the models of interest are directly expressed in Relay text form in menangerie.py. * - lint

mbs-octoml added 2 commits July 14, 2022 16:25

- lint

fa39eab

jwfromm reviewed Jul 15, 2022

View reviewed changes

jwfromm approved these changes Jul 15, 2022

View reviewed changes

jwfromm merged commit d436501 into apache:main Jul 15, 2022

mbs-octoml deleted the mbs-collage-sketch branch July 15, 2022 18:18

AndrewZhaoLuo mentioned this pull request Oct 4, 2022

TVM v0.10.0.rc0 Release Candidate Notes #12979

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Collage] PruneCandidates and demo_collage_partition.py #12105

[Collage] PruneCandidates and demo_collage_partition.py #12105

Uh oh!

mbs-octoml commented Jul 14, 2022

Uh oh!

jwfromm Jul 15, 2022

Uh oh!

mbs-octoml Jul 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Collage] PruneCandidates and demo_collage_partition.py #12105

[Collage] PruneCandidates and demo_collage_partition.py #12105

Uh oh!

Conversation

mbs-octoml commented Jul 14, 2022

Uh oh!

jwfromm Jul 15, 2022

Choose a reason for hiding this comment

Uh oh!

mbs-octoml Jul 15, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants