Currently, calling aiter.gemm_a4w4(...) with the untuned shape like M=32800, N=2112, K=7168 caused:
RuntimeError: get_heuristic_kernel: cannot get heuristic kernel!
Instead of failing and to avoid program exit and improve usability for rare shapes, can we handle it differently?
For example, Option 1: Use a fallback kernel and print a warning like:
Using non-optimal kernel. Performance may be slow. Please tune via or see docs.
Other options, maybe has a way to trigger auto-tuning with some environment flag for such case and cache the config and use it later.
ps: need to run manual tuning step to get the tuned config (#1687)