Hello,
I am trying to do some GEMM sweeps using the experiments/gemm.py script (via the docker container provided by the repo). When the GEMM sizes are large (e.g. MNK=16384), PyTorchSim crashes, and I am not sure how to proceed. My understanding is that the configuration (systolic_ws_128x128_c2_simple_noc_tpuv4.json) configures 32 channels x 8Gb (32GB total), and that the size should fit just fine. Can you please suggest next steps? Attached is the full log from the run.
Thank you,
Neal Crago
gemm16384.log