Skip to content

Conversation

@sunwayforever
Copy link
Contributor

@sunwayforever sunwayforever commented Oct 21, 2021

test code:

import tvm
from tvm import relay

x = relay.var("x", shape=(1, 1024), dtype="int8")
y = relay.add(x, x)
func = relay.Function([x], y)
mod = tvm.IRModule.from_expr(func)

with tvm.transform.PassContext(opt_level=3):
    graph, lib, params = relay.build(
        mod,
        target="llvm --device=arm_cpu -mtriple=armv7a-linux-gnueabihf -mattr=+neon",
        params=None,
    )

lib.save("/tmp/a.o")

arm-linux-gnueabihf-objdump -d /tmp/a.o will produce output like:

000002a8 <tvmgen_default_fused_add_compute_>:
2a8: e3a02000 mov r2, #0
2ac: e0813002 add r3, r1, r2
2b0: f4e3083f vld1.32 {d16[0]}, [r3 :32]
2b4: e0803002 add r3, r0, r2
2b8: e2822004 add r2, r2, #4
2bc: e3520b01 cmp r2, #1024 ; 0x400
2c0: f3c80a30 vmovl.u8 q8, d16
2c4: f2d10530 vshl.s16 d16, d16, #1
2c8: f3f2012 vuzp.8 d16, d17
2cc: f4c3083f vst1.32 {d16[0]}, [r3 :32]
2d0: 1afffff5 bne 2ac <tvmgen_default_fused_add_compute_+0x4>
2d4: e12fff1e bx lr

when the PR is applied, it will change to:

000002a8 <tvmgen_default_fused_add_compute_>:
2a8: e3a02000 mov r2, #0
2ac: e0813002 add r3, r1, r2
2b0: f4630aef vld1.64 {d16-d17}, [r3 :128]
2b4: e0803002 add r3, r0, r2
2b8: e2822010 add r2, r2, #16
2bc: f2c90570 vshl.s8 q8, q8, #1
2c0: e3520b01 cmp r2, #1024 ; 0x400
2c4: f4430aef vst1.64 {d16-d17}, [r3 :128]
2c8: 1afffff7 bne 2ac <tvmgen_default_fused_add_compute_+0x4>
2cc: e12fff1e bx lr

because neon support up to 16x8-bit operations per instruction

@u99127
Copy link

u99127 commented Oct 21, 2021

Can a testcase be added for this ?

@sunwayforever
Copy link
Contributor Author

Can a testcase be added for this ?

I'd like to, but don't know how to write the testcase...since the PR targets a sub-optimal issue, and I didn't find testcases related with schedule_injective and arm_cpu for reference.

@AndrewZhaoLuo
Copy link
Contributor

Can you fill out the PR description with what you did and why.

Also a link or comment in the code explaining what is done would be helpful for reading.

Copy link
Contributor

@AndrewZhaoLuo AndrewZhaoLuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the documentation.

You will need to rebase on main in order to fix a flaky test.

@masahi masahi merged commit 9315113 into apache:main Oct 27, 2021
@sunwayforever sunwayforever deleted the neon branch October 27, 2021 08:22
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 7, 2022
…9339)

* schedule_injective of arm_cpu should consider dtype itemsize

* trigger CI
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
…9339)

* schedule_injective of arm_cpu should consider dtype itemsize

* trigger CI
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants