Make reference ukernels MR/NR sizes configurable#547
Make reference ukernels MR/NR sizes configurable#547hominhquan wants to merge 1 commit intoflame:masterfrom
Conversation
- Those reference micro-sizes are hardcoded for some performance purposes (#pragma omp simd). They may work most of the time, except when user begins tuning the MR/NR sizes. - This commit exposes them as configurable by user in bli_family_<arch>.h, makes generation of reference ukernels more robust/consistent. Some explanation is added to put future developers on guard (should be added in wiki as well).
7ad556f to
113a1a9
Compare
|
Update with added doc in |
|
@hominhquan I'm planning to address this issue within a broader update of the microkernel layer over the next few months. Is having a fix for this issue being merged into |
|
@devinamatthews Yes this fix is useful to me. It may also prevent other people from falling to the issue, at least in the next few months. So yes I would like to see it merged into master. |
|
@fgvanzee what do you think? The other fix of course is to disable the vectorized reference kernel. |
Is disabling the vectorized reference gemm ukernel, as you propose, really just a matter of changing |
|
@fgvanzee I believe it is that easy. |
|
@devinamatthews or @fgvanzee Can you tell if this has been addressed in any ongoing development and should I drop this PR ? |
|
@devinamatthews might be able to comment further, but yes, this is part of ongoing development. We've spoken about Devin's idea as it pertains to this topic and I think it will work for your purposes, too, @hominhquan. |
|
@hominhquan after some pending PRs I'm planning to work on overhauling the way kernels are handled. As part of that, I plan to reintroduce the MR/NR macros so that reference kernels can use them to do compile-time optimizations safely. |
|
Ok, so I drop this PR and wait for your coming update. |
(#pragma omp simd). They may work most of the time, except when user begins
tuning the MR/NR sizes.
makes generation of reference ukernels more robust/consistent. Some explanation
is added to put future developers on guard (should be added in wiki as well).