Skip to content

Analyze matmul epilogue to determine vectorization #2169

@jacobhinkle

Description

@jacobhinkle

PR #2105 implements operand analysis to determine vectorization of gmem loads. Quote:

Are we supporting a general pointwise epilogue fusion, or we only support biases? For example, if I have a "bias" whose shape is [N, M], I transpose it to [M, N] and add it to matmul result, can this be handled? It is very important that what we accept into the scheduler is compatible with what we assume here. Building out a general vectorization analysis should refer to the pointwise scheduler and uses code in https://github.com/NVIDIA/Fuser/blob/main/csrc/scheduler/vectorize_helper.h, and this is what #807 is doing.

Regarding this, I believe we have two options:

  1. Make sure that we accept only the very limited cases of epilogue fusion (i.e. just with bias and activation) into the schedule and use this simple analysis.
  2. Use vectorize_helper to build out a complete analysis for pointwise epilogue like the pointwise scheduler.
    Whichever option we take, I don't think that is easy and well tested. For option 1, we need to review the scheduler canScheduleCompileTime code and brainstorm more adversarial examples, and for option 2, we need to copy some code from the pointwise scheduler like in Matmul, enable epilogue input vectorization #807. @drzejan2 do you remember the status of Matmul, enable epilogue input vectorization #807?

But anyway, epilogue vectorization is a much more difficult task than A and B. Can we move it to a separate PR?

Originally posted by @zasdfgbnm in #2105 (comment)

Option 1 is tracked in #2167. This issue corresponds to option 2 listed above. We might additionally need to update MatmulParams::SupportedVectorization if we support different vectorizations for the different input and output tensors

See #807 and #682

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions