-
Notifications
You must be signed in to change notification settings - Fork 79
Description
PR #2105 implements operand analysis to determine vectorization of gmem loads. Quote:
Are we supporting a general pointwise epilogue fusion, or we only support biases? For example, if I have a "bias" whose shape is
[N, M], I transpose it to[M, N]and add it to matmul result, can this be handled? It is very important that what we accept into the scheduler is compatible with what we assume here. Building out a general vectorization analysis should refer to the pointwise scheduler and uses code in https://github.com/NVIDIA/Fuser/blob/main/csrc/scheduler/vectorize_helper.h, and this is what #807 is doing.
Regarding this, I believe we have two options:
- Make sure that we accept only the very limited cases of epilogue fusion (i.e. just with bias and activation) into the schedule and use this simple analysis.
- Use
vectorize_helperto build out a complete analysis for pointwise epilogue like the pointwise scheduler.
Whichever option we take, I don't think that is easy and well tested. For option 1, we need to review the schedulercanScheduleCompileTimecode and brainstorm more adversarial examples, and for option 2, we need to copy some code from the pointwise scheduler like in Matmul, enable epilogue input vectorization #807. @drzejan2 do you remember the status of Matmul, enable epilogue input vectorization #807?
But anyway, epilogue vectorization is a much more difficult task than A and B. Can we move it to a separate PR?
Originally posted by @zasdfgbnm in #2105 (comment)
Option 1 is tracked in #2167. This issue corresponds to option 2 listed above. We might additionally need to update MatmulParams::SupportedVectorization if we support different vectorizations for the different input and output tensors