Proposal
Generating an Inter-Op plan with ColossalAuto takes usually a 1-2 minutes when running examples/tutorial/auto_parallel/auto_parallel_with_resnet.py. Profiling with cProfile reveals that a large portion of this time is consumed by calling copy.deepcopy, especially in the method DimSpec.build_difference_2d_dict(). Since many DimSpec objects are created1, that function is also called hundreds of thousands of times. Upon closer examination of the logic in this function, it becomes apparent that the result of this method is in fact independent of the DimSpec object, and its content is not mutated throughout its lifetime. Hence, it suffices to only create this dict once and share it among all instances of DimSpec. Due to the large quantity of DimSpec instances created throughout the plan generation, this change can introduce a speed-up of up to 50%2.
Self-service
Proposal
Generating an Inter-Op plan with ColossalAuto takes usually a 1-2 minutes when running
examples/tutorial/auto_parallel/auto_parallel_with_resnet.py. Profiling with cProfile reveals that a large portion of this time is consumed by callingcopy.deepcopy, especially in the methodDimSpec.build_difference_2d_dict(). Since manyDimSpecobjects are created1, that function is also called hundreds of thousands of times. Upon closer examination of the logic in this function, it becomes apparent that the result of this method is in fact independent of theDimSpecobject, and its content is not mutated throughout its lifetime. Hence, it suffices to only create this dict once and share it among all instances ofDimSpec. Due to the large quantity ofDimSpecinstances created throughout the plan generation, this change can introduce a speed-up of up to 50%2.Self-service
Footnotes
many of which are just empty placeholders btw ↩
when running
examples/tutorial/auto_parallel/auto_parallel_with_resnet.py↩