currently, in the upstream version of auto scheduler on arm cpu can not reproduce OSDI paper's performance. One issue we find is we call wrong compute for arm cpu (https://github.com/apache/tvm/blob/main/python/tvm/relay/op/strategy/arm_cpu.py#L156) and we don't enable winograd for arm cpu too. Open this issue to tracker this.
@jcf94