-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Closed
Labels
tune:meta_schedulesrc/meta_schedule, python/tvm/meta_schedulesrc/meta_schedule, python/tvm/meta_scheduletype:rfc-trackingRFC progress tracking. Ref: https://github.com/apache/tvm-rfcsRFC progress tracking. Ref: https://github.com/apache/tvm-rfcs
Description
This is a global tracking issue for landing the meta schedule. The RFC can be found here.
Steps
The steps are numbered following TensorIR (#7527).
[M3a] Core infrastructure
- Instruction & Trace [MetaSchedule][M3a] Instruction and Trace #8615
- TracedSchedule [MetaSchedule][M3a] Traced Schedule #8623
- Sampler [Support] Linear Congruential Random Engine #8642 [MetaSchedule][M3a] Add Sampling Primitive SampleCategorical. #8817
- Design space generator [MetaSchedule][M3a] SpaceGenerator #9079
- Search strategy [MetaSchedule][M3a] SearchStrategy #9132
- Task Scheduler [MetaSchedule][M3a] TaskScheduler #9154
- Tune Context [MetaSchedule][M3a] TuneContext #9053
[M3b] Enable measurement
- Argument Info [MetaSchedule][M3b] Argument Info #9059
- Builder; Builder input/result [MetaSchedule][M3b] Builder #9044
- Runner; Runner input/result [MetaSchedule][M3b] Runner #9111
- Tuning Record; Database [MetaSchedule][M3b] Database #9061
[M3c] Enhance search
- ScheduleRule, Mutator, PostProcessor [MetaSchedule][M4a] Add ScheduleRule class & PostOrderApply space generator #9761 [MetaSchedule][M3c] Update TuneContext, TaskScheduler & Search Strategy Design #9789
- Cost model [MetaSchedule][M3c] XGB-based Cost Model #9859 [MetaSchedule][M3c] Update TuneContext, TaskScheduler & Search Strategy Design #9789
- Feature extraction [MetaSchedule][M3c] Random Feature Extractor #9760 [MetaSchedule][M3c] Add Per-Store-Feature #9860
- Measure callback [MetaSchedule][M3c] Add More Measure Callbacks #9780
[M4a] Performance & Coverage
Schedule Rules
- Add-RFactor [MetaSchedule][M4a] Schedule Rule: Add-RFactor #9975
- Auto-Inline [MetaSchedule][M4a] Schedule Rule: Auto-Inline #9943
- Cross-Thread-Reduction [MetaSchedule][M4a] Schedule Rule: Cross-Thread-Reduction #9994
- Multi-Level-Tiling [MetaSchedule][M4a] Schedule Rule: Multi-Level-Tiling #10043
- Parallel-Vectorize-Unroll [MetaSchedule][M4a] Schedule Rule: Parallelize-Vectorize-Unroll #10033
- Random-Compute-Location [MetaSchedule][M4a] Schedule Rule: Random-Compute-Location #9940
PostProcessors
- Disallow-Dynamic-Loop [MetaSchedule][M4a] PostProcessor: Disallow-Dynamic-Loop #9997
- Rewrite-Cooperative-Fetch [MetaSchedule][M4a] Rewrite-Cooperative-Fetch #10081
- Rewrite-Parallel-Vectorize-Unroll [MetaSchedule][M4a] PostProcessor: Rewrite-Parallel-Vectorize-Unroll #10071
- Rewrite-Reduction-Block [MetaSchedule][M4a] PostProcessor: Rewrite Reduction Block #10013
- Rewrite-Unbound-Block [MetaSchedule][M4a] PostProcessor: Rewrite-Unbound-Block #10027
- Verify-GPU-Code [MetaSchedule][M4a] PostProcessor: Verify-GPU-Code #9945
Mutators
- Mutate-Compute-Location [MetaSchedule] Mutator: Mutate-Compute-Location #10028
- Mutate-Parallel [MetaSchedule][M4a] Mutator: Mutate Parallel #10096
- Mutate-Tile-Size [MetaSchedule][M4a] Mutator: Mutate-Tile-Size #10092
- Mutate-Unroll [MetaSchedule] Mutator: Mutate-Unroll #10045
User interface
- Tune-TE [MetaSchedule][M4a] User-API: Tune-TE/TIR/Relay #10079
- Tune-TIR [MetaSchedule][M4a] User-API: Tune-TE/TIR/Relay #10079
- Tune-Relay [MetaSchedule][M4a] User-API: Tune-TE/TIR/Relay #10079
Misc
- Local Runner [MetaSchedule][M4a] Local runner #9153
- Design-Space-Generator: Post-Order-Apply [MetaSchedule][M4a] Add ScheduleRule class & PostOrderApply space generator #9761
- SearchStrategy: Replay-Func (random search) [MetaSchedule][M4a] Add ReplayFunc Search Strategy #9799
- SearchStrategy: Evolutionary-Search [MetaSchedule][M4a] Add EvolutionarySearch Search Strategy #9836
[M4b] Relay integration
- Task extraction [MetaSchedule][M4b] Task Extraction #9382
- Apply-History-Best [MetaSchedule][M4b] Add
ApplyHisotryBestMeta Schedule Context #10049 - Builder/Runner working with Relay and Relay BYOC [MetaSchedule][M4b] Misc improvement of the Measurer #9757 [MetaSchedule][M4b] Testcases for TensorRT builder/runner #10055
M5. Operator coverage with all backends for auto tensorization
Being able to tensorize on all the backends
- TIR primitive: Re-Index [TIR] Add schedule primitive ReIndex #11515
- TIR primitive: Transform-Block-Layout [TIR] Add schedule primitive TransformBlockLayout #11485
- MetaSchedule auto tensorization helper:
TileWithTensorIntrin[TIR] Utility function to decide loop mapping for auto tensorization #11050 [TIR] Add function to tile a block according to a given tensor intrinsic #11075 - MetaSchedule: enhance Multi-Level Tiling [MetaSchedule] Add MultiLevelTilingTensorCore rule for auto-tensorization on CUDA #12059 [MetaSchedule] Allow MultiLevelTilingTensorCore rule to specify multiple tensor intrin groups #12113
- MetaSchedule: Rewrite-Tensorize [Metaschedule] Auto tensorization for CPU / GPU dot product #11088
- Analysis: MappingProposer and AutoTensorizeComparator [TIR, analysis] Add GetAutoTensorizeMappingInfo to generate transforms for auto tensorization #11740
- Intel VNNI / ARM dot variants [Metaschedule] Auto tensorization for CPU / GPU dot product #11088
M6. Memory optimization
Important for CUDA performance, not CPU. Not related to functionality.
- TIR primitive: Read/Write-at
- Support ewise fusion in MemHammer
- Cover non-fp16, non-wmma usecases
- Shared memory auto padding [MetaSchedule] Support padding for irregular shapes for CUDA tensor core #12759
- Global memory coalescing
- Shared ⇒ WMMA, WMMA ⇒ shared/global rewriting
- Insert caching stage [TIR] Add pass ManifestSharedMemoryLocalStage #12355
M7. Unblock end-to-end experiments
- Handle reshape fusion
- Develop scripts to run experiment
- Benchmark on the selected operator set (C1D, C2D, C3D, CAP, DIL, GMM, GRP, T2D)
- Performance alignment attempt
M8. Broader Set of Intrinsics and Optimization
- async pipeline [PTX] Intrinsics for async copy from global to shared (SM80) #11368
- Permuted layout
- LDMatrix / MMA [TIR] Support tensorization using ldmatrix + MMA #11355
cbalint13, merrymercy, zxybazh, jwfromm and zhyncs
Metadata
Metadata
Assignees
Labels
tune:meta_schedulesrc/meta_schedule, python/tvm/meta_schedulesrc/meta_schedule, python/tvm/meta_scheduletype:rfc-trackingRFC progress tracking. Ref: https://github.com/apache/tvm-rfcsRFC progress tracking. Ref: https://github.com/apache/tvm-rfcs