Skip to content

ARMSVE Block SVE-Intrinsic Kernels for GCC 8-9#543

Merged
devinamatthews merged 4 commits intoflame:masterfrom
xrq-phys:armsve-packm-fix
Oct 9, 2021
Merged

ARMSVE Block SVE-Intrinsic Kernels for GCC 8-9#543
devinamatthews merged 4 commits intoflame:masterfrom
xrq-phys:armsve-packm-fix

Conversation

@xrq-phys
Copy link
Copy Markdown
Collaborator

@xrq-phys xrq-phys commented Sep 16, 2021

Trying to address a64fx part of #535 .

Problem:
Directive #if __has_include(<arm_sve.h>) will be inserted into blis.h.
This will break down some compilers. (i.e., when we compile BLIS with GCC but want to link it to some apps compiled with vendor CC, the latter would break down if it's not GCC5 or Clang-compatible.)

For the case of A64FX: __has_include(<arm_sve.h>) works? <arm_sve.h> exists?
GCC 8+ 🟢Works, of course ℹ️Only for 10+
Clang 9+ 🟢Works ℹ️Only for 11+
Arm ACfL, Clang-based 🟢Works Yes
Cray CC, Legacy 🟢Works No
Cray CC, Clang-based 🟢Works No
Fujitsu CC, Legacy Breaks down Yes
Fujitsu CC, Clang-based 🟢Works Yes

Maybe it's better to left all kernels defined and raise runtime errors when <arm_sve.h> is not found?


Above approach deprecated.
See Devin's comments for our more stable way.

RuQing Xu added 2 commits September 16, 2021 04:31
SVE-Intrinsic-based kernels ought not to use asm in their names.
(
1,
BLIS_PACKM_8XK_KER, BLIS_DOUBLE, bli_dpackm_armsve256_asm_8xk,
BLIS_PACKM_8XK_KER, BLIS_DOUBLE, bli_dpackm_armsve256_int_8xk,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the 10xk kernel also be used here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10xk is for VL=512bits.
Packing kernels are VL-specific, I'm afraid.

#include "blis.h"

#ifdef __ARM_FEATURE_SVE
#if __has_include(<arm_sve.h>)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the availability of intrinsics really only affects A64fx, right? Can we branch on an A64fx-specific macro here instead of using __has_include which might not be as portable?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. BLIS_FAMILY_A64FX should work for this case.

Comment thread config/armsve/bli_cntx_init_armsve.c Outdated
3,
BLIS_PACKM_10XK_KER, BLIS_DOUBLE, bli_dpackm_armsve512_asm_10xk,
BLIS_PACKM_12XK_KER, BLIS_DOUBLE, bli_dpackm_armsve512_asm_12xk,
BLIS_PACKM_12XK_KER, BLIS_DOUBLE, bli_dpackm_armsve512_int_12xk,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the 12xk kernel actually used in this config? If not it can be moved to the "old" directory.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused in fact.

@devinamatthews
Copy link
Copy Markdown
Member

OK, so it looks like we should:

  1. Remove references to the 12xk kernel in the context initialization.
  2. Move the 12xk kernel to an "old" subdirectory (optional).
  3. Use #ifndef BLIS_FAMILY_A64FX instead of __has_include(<arm_sve.h>).

Another question: why is the 10xk packing kernel not registered for m_r_d == 8?

@xrq-phys
Copy link
Copy Markdown
Collaborator Author

xrq-phys commented Oct 5, 2021 via email

@devinamatthews
Copy link
Copy Markdown
Member

Packm kernel is not VL-agnostic.
Currently using m_r_d to differentiate SVE VL cases.

Right, but the two expected outcomes are 8x10 (256 bit) and 16x10 (512 bit) right? Then you need a 10xk packing kernel in both cases.

@xrq-phys
Copy link
Copy Markdown
Collaborator Author

xrq-phys commented Oct 6, 2021

@devinamatthews That is... Ugh... Unimplemented yet.

Unlike GEMM, the PACKM kernels are only implemented for some vector length cases. (Note kernel names. bli_dpackm_armsve512_*** in contraction to bli_gemm_armsve_***.)

So I have to fall back to ref PACKM kernels for 8x10 (256bit) cases, though implementing one may be the best choice.

@devinamatthews
Copy link
Copy Markdown
Member

Riiiiight, because the size is the same but your vectors aren't.

@devinamatthews
Copy link
Copy Markdown
Member

SVE isn't great for short vectorization 😢

@xrq-phys
Copy link
Copy Markdown
Collaborator Author

xrq-phys commented Oct 6, 2021

Agreed. It's quite handy when doing inner products though ;)

@xrq-phys
Copy link
Copy Markdown
Collaborator Author

xrq-phys commented Oct 6, 2021

OK, so it looks like we should:

  1. Remove references to the 12xk kernel in the context initialization.
  2. Move the 12xk kernel to an "old" subdirectory (optional).
  3. Use #ifndef BLIS_FAMILY_A64FX instead of __has_include(<arm_sve.h>).

Another question: why is the 10xk packing kernel not registered for m_r_d == 8?

Done :D

Seems that Travis CI is not triggered.
I've heard about them expiring charge-free OSS builds. Perhaps this is the problem?

@devinamatthews
Copy link
Copy Markdown
Member

@fgvanzee we're out of credits again 😭

@xrq-phys
Copy link
Copy Markdown
Collaborator Author

xrq-phys commented Oct 7, 2021

FYI I've made this kind of things in my dev branch :D

https://github.com/xrq-phys/blis/blob/main-dev/.github/workflows/auto-release.yml

@fgvanzee
Copy link
Copy Markdown
Member

fgvanzee commented Oct 7, 2021

Credits are replenished. I also created a Google Calendar reminder (on a 20-day cycle).

@devinamatthews devinamatthews merged commit 32a6d93 into flame:master Oct 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants