Skip to content

New kernel set for Arm SVE using assembly#396

Merged
fgvanzee merged 1 commit intoflame:masterfrom
docularxu:sve-upstream-staging
Apr 29, 2020
Merged

New kernel set for Arm SVE using assembly#396
fgvanzee merged 1 commit intoflame:masterfrom
docularxu:sve-upstream-staging

Conversation

@docularxu
Copy link
Copy Markdown

Here adds two kernels for Arm SVE vector extensions.

  1. a gemm kernel for double at sizes 8x8.
  2. a packm kernel for double at dimension 8xk.

To achive best performance, variable length agonostic programming
is not used. Vector length (VL) of 256 bits is mandated in both kernels.
Kernels to support other VLs can be added later.

"SVE is a vector extension for AArch64 execution mode for the A64
instruction set of the Armv8 architecture. Unlike other SIMD architectures,
SVE does not define the size of the vector registers, but constrains into
a range of possible values, from a minimum of 128 bits up to a maximum of
2048 in 128-bit wide units. Therefore, any CPU vendor can implement the
extension by choosing the vector register size that better suits the
workloads the CPU is targeting. Instructions are provided specifically
to query an implementation for its register size, to guarantee that
the applications can run on different implementations of the ISA without
the need to recompile the code." [1]

[1] https://developer.arm.com/solutions/hpc/resources/hpc-white-papers/arm-scalable-vector-extensions-and-application-to-machine-learning

Signed-off-by: Guodong Xu guodong.xu@linaro.org

Here adds two kernels for Arm SVE vector extensions.
1. a gemm  kernel for double at sizes 8x8.
2. a packm kernel for double at dimension 8xk.

To achive best performance, variable length agonostic programming
is not used. Vector length (VL) of 256 bits is mandated in both kernels.
Kernels to support other VLs can be added later.

"SVE is a vector extension for AArch64 execution mode for the A64
instruction set of the Armv8 architecture. Unlike other SIMD architectures,
SVE does not define the size of the vector registers, but constrains into
a range of possible values, from a minimum of 128 bits up to a maximum of
2048 in 128-bit wide units. Therefore, any CPU vendor can implement the
extension by choosing the vector register size that better suits the
workloads the CPU is targeting. Instructions are provided specifically
to query an implementation for its register size, to guarantee that
the applications can run on different implementations of the ISA without
the need to recompile the code."  [1]

[1] https://developer.arm.com/solutions/hpc/resources/hpc-white-papers/arm-scalable-vector-extensions-and-application-to-machine-learning

Signed-off-by: Guodong Xu <guodong.xu@linaro.org>
@fgvanzee
Copy link
Copy Markdown
Member

@docularxu Sorry for the delayed response. Thank you for this contribution! I checked the new files and it appears that you found, read, and followed the section of the ConfigurationHowTo regarding adding new kernel sets. Thanks for making my review easy!

We don't have access to ARM SVE hardware, so we can't test out the code. Furthermore, we have limited experience on ARM, so please consider yourself nominated as the maintainer of these new kernels going forward. :) But feel free to ask us questions should the need arise.

@fgvanzee fgvanzee merged commit f032d5d into flame:master Apr 29, 2020
@jeffhammond
Copy link
Copy Markdown
Member

@fgvanzee https://developer.arm.com/tools-and-software/server-and-hpc/compile/arm-instruction-emulator, but I completely understand why you would not want to mess with this.

@fgvanzee
Copy link
Copy Markdown
Member

fgvanzee commented May 8, 2020

but I completely understand why you would not want to mess with this.

Yeah, unfortunately I don't have time to take on a new project like this. But it's good to know such resources are available for others and/or for my own use in the future.

@rvdg
Copy link
Copy Markdown
Collaborator

rvdg commented May 8, 2020 via email

@docularxu
Copy link
Copy Markdown
Author

I am using 'qemu-aarch64' [1] for verification of these two kernels. And I will keep optimizing them when I get hands-on real hardware.

So far, the only public commercially available ARM-SVE-enabled hardware is Fujitsu's A64FX [2]. A64FX has 512-bit vector length, which is different with what I submitted.

[1] https://www.linaro.org/blog/sve-in-qemu-linux-user/
[2] https://github.com/fujitsu/A64FX

pradeeptrgit pushed a commit to amd/blis that referenced this pull request Jun 30, 2020
Here adds two kernels for Arm SVE vector extensions.
1. a gemm  kernel for double at sizes 8x8.
2. a packm kernel for double at dimension 8xk.

To achive best performance, variable length agonostic programming
is not used. Vector length (VL) of 256 bits is mandated in both kernels.
Kernels to support other VLs can be added later.

"SVE is a vector extension for AArch64 execution mode for the A64
instruction set of the Armv8 architecture. Unlike other SIMD architectures,
SVE does not define the size of the vector registers, but constrains into
a range of possible values, from a minimum of 128 bits up to a maximum of
2048 in 128-bit wide units. Therefore, any CPU vendor can implement the
extension by choosing the vector register size that better suits the
workloads the CPU is targeting. Instructions are provided specifically
to query an implementation for its register size, to guarantee that
the applications can run on different implementations of the ISA without
the need to recompile the code."  [1]

[1] https://developer.arm.com/solutions/hpc/resources/hpc-white-papers/arm-scalable-vector-extensions-and-application-to-machine-learning

Signed-off-by: Guodong Xu <guodong.xu@linaro.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants