[TOPI] Revert unification of conv2d NHWC hybrid scheduling for `arm_cpu` targets #16951

Anndrey24 · 2024-04-29T09:44:16Z

This patch partly reverts the unification of scalable and non-scalable scheduling of conv2d NHWC for arm_cpu targets introduced in #16899.

The non-scalable schedule for float32 splits the N axis (corresponding to number of output channels) by 16 in both the unified and the nonunified schedule versions, and then additionally splits the inner partitions by 4 in only the nonunified version to which this patch is reverting (first added in #16106). The two versions' behaviour would be equivalent if none of the padding on the N axis was removed during lowering, however we allow for that to happen as it proved to increase performance for very small convolutions.

As it stands, there seems to be a regression in cases where the datatype is float32 and the number of output channels is greater than 16, a multiple of 4, and not a multiple of 16, because even with the removed padding the nonunified schedule is able to vectorise over 4 elements, while the unified version cannot vectorise over 16 elements anymore.

Since all of the conv2d NHWC hybrid topi test cases used numbers of output channels either less than 16 or divisible by 16, this patch also adds a new case which falls in the aforementioned regression area.

cc @lhutton1 @ekalda

…pu` targets This patch partly reverts the unification of scalable and non-scalable scheduling of conv2d NHWC for `arm_cpu` targets introduced in apache#16899. The non-scalable schedule for float32 splits the N axis (corresponding to number of output channels) by 16 in both the unified and the nonunified schedule versions, and then additionally splits the inner partitions by 4 in only the nonunified version to which this patch is reverting (first added in apache#16106). The two versions' behaviour would be equivalent if none of the padding on the N axis was removed during lowering, however we allow for that to happen as it proved to increase performance for very small convolutions. As it stands, there seems to be a regression in cases where the datatype is float32 and the number of output channels is greater than 16, a multiple of 4, and not a multiple of 16, because even with the removed padding the nonunified schedule is able to vectorise over 4 elements, while the unified version cannot vectorise over 16 elements anymore. Since all of the conv2d NHWC hybrid topi test cases used numbers of output channels either less than 16 or divisible by 16, this patch also adds a new case which falls in the aforementioned regression area.

ekalda

Thanks for the fix @Anndrey24!

lhutton1 · 2024-04-29T18:34:16Z

Thanks @Anndrey24 @ekalda!

github-actions bot requested review from ekalda and lhutton1 April 29, 2024 09:44

ekalda approved these changes Apr 29, 2024

View reviewed changes

lhutton1 approved these changes Apr 29, 2024

View reviewed changes

lhutton1 merged commit 114ad70 into apache:main Apr 29, 2024

Anndrey24 deleted the conv2d-regression branch April 30, 2024 12:38

Anndrey24 mentioned this pull request Jun 14, 2024

[TOPI] Add dense schedule for fp16 and fp32 using gemm #17091

Merged

ysh329 mentioned this pull request Jul 20, 2024

[Release] v0.17.0 Release Candidate Notes #17178

Closed

kurisu6912 mentioned this pull request Sep 5, 2025

kurisu add assume attr patch 1 tile-ai/tvm#8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TOPI] Revert unification of conv2d NHWC hybrid scheduling for `arm_cpu` targets #16951

[TOPI] Revert unification of conv2d NHWC hybrid scheduling for `arm_cpu` targets #16951

Uh oh!

Anndrey24 commented Apr 29, 2024

Uh oh!

ekalda left a comment

Uh oh!

lhutton1 commented Apr 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[TOPI] Revert unification of conv2d NHWC hybrid scheduling for arm_cpu targets #16951

[TOPI] Revert unification of conv2d NHWC hybrid scheduling for arm_cpu targets #16951

Uh oh!

Conversation

Anndrey24 commented Apr 29, 2024

Uh oh!

ekalda left a comment

Choose a reason for hiding this comment

Uh oh!

lhutton1 commented Apr 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[TOPI] Revert unification of conv2d NHWC hybrid scheduling for `arm_cpu` targets #16951

[TOPI] Revert unification of conv2d NHWC hybrid scheduling for `arm_cpu` targets #16951