Skip to content

Added back BLIS_ENABLE_ZEN_BLOCK_SIZES macro to zen configuration.#573

Merged
fgvanzee merged 1 commit intoflame:masterfrom
kvaragan:zen_blocksizes
Dec 7, 2021
Merged

Added back BLIS_ENABLE_ZEN_BLOCK_SIZES macro to zen configuration.#573
fgvanzee merged 1 commit intoflame:masterfrom
kvaragan:zen_blocksizes

Conversation

@kvaragan
Copy link
Copy Markdown
Contributor

@kvaragan kvaragan commented Nov 1, 2021

This pull-request contains cache-block sizes tuned for AMD Naples system. This was added before to improve DGEMM Multithreaded scalability on Naples for when number of threads is greater than 16. This change was not reflected in public repo, this change seems to be deleted, now we are adding this change back.

Change-Id: I6827b58d2dab1041fe182fef5a007b679ac4bb1f

This is same as release 1.3. This was added before to improve DGEMM Multithreaded scalability on Naples for when number of threads is greater than 16. By mistake this got deleted in many changes done for 2.0 release, now we are adding this change back., in bli_gemm_front.c - code clean

Change-Id: I6827b58d2dab1041fe182fef5a007b679ac4bb1f
@fgvanzee fgvanzee merged commit 961d9d5 into flame:master Dec 7, 2021
dzambare pushed a commit to Meghana-vankadari/blis that referenced this pull request Jan 6, 2022
Details:
- Added previously-deleted cpp macro block to bli_cntx_init_zen.c 
  targeting the Naples microarchitecture that enabled different cache 
  blocksizes when the number of threads exceeds 16. This commit 
  represents PR flame#573.
fgvanzee added a commit that referenced this pull request Nov 3, 2022
Details:
- Added previously-deleted cpp macro block to bli_cntx_init_zen.c
  targeting the Naples microarchitecture that enabled different cache
  blocksizes when the number of threads exceeds 16. This commit
  represents PR #573.
- (cherry picked from commit 961d9d5)
fgvanzee added a commit that referenced this pull request Mar 17, 2023
Details:
- Added previously-deleted cpp macro block to bli_cntx_init_zen.c
  targeting the Naples microarchitecture that enabled different cache
  blocksizes when the number of threads exceeds 16. This commit
  represents PR #573.
- (cherry picked from commit 961d9d5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants