Skip to content

Fixed out-of-bounds read bug in sup haswell ukr.#824

Merged
fgvanzee merged 1 commit intomasterfrom
sup_rd_s1x16n_fix
Aug 8, 2024
Merged

Fixed out-of-bounds read bug in sup haswell ukr.#824
fgvanzee merged 1 commit intomasterfrom
sup_rd_s1x16n_fix

Conversation

@fgvanzee
Copy link
Copy Markdown
Member

@fgvanzee fgvanzee commented Aug 8, 2024

Details:

  • Fixed a bug in the bli_sgemmsup_rd_haswell_asm_1x16n() millikernel. The kernel was erroneously performing an out-of-bounds read whenever the singleton edge case loop executed (that is, whenever the k dimension of the millikernel problem was not a multiple of 8). This OOB error was the result of a copy-paste bug; when developing the s1x16n function, I started from a copy of the s2x16n function, but then failed to delete the instruction that reads the second element of A in the code that handles the PR loop's edge case. Thanks to @j-bm for reporting this bug in Issue LAPACK test segfault on zen/zen2/zen3 at bli_sgemmsup_rd_haswell_asm_1x16n #821 and helping narrow down the cause to the rax register.

Details:
- Fixed a bug in the bli_sgemmsup_rd_haswell_asm_1x16n() millikernel.
  The kernel was erroneously performing an out-of-bounds read whenever
  the singleton edge case loop executed (that is, whenever the k
  dimension of the millikernel problem was not a multiple of 8). This
  OOB error was the result of a copy-paste bug; when developing the
  s1x16n function, I started from a copy of the s2x16n function, but
  then failed to delete the instruction that reads the second element
  of A in the code that handles the PR loop's edge case. Thanks to
  @j-bm for reporting this bug in Issue #821 and helping narrow down
  the cause to the rax register.
@fgvanzee fgvanzee requested a review from devinamatthews August 8, 2024 00:55
@fgvanzee fgvanzee merged commit a822cb2 into master Aug 8, 2024
@fgvanzee fgvanzee deleted the sup_rd_s1x16n_fix branch August 8, 2024 18:34
fgvanzee added a commit that referenced this pull request Aug 20, 2024
Details:
- Previously, if the user enabled CBLAS via 'configure --enable-cblas'
  and then ran 'make', the flattened blis.h header file would be created
  immediately, but the flattened cblas.h header file would not be
  created until 'make install' was run. This was happening because
  nothing in the BLIS build process (except installation) depended on
  the flattened cblas.h (whereas *everything* depends on the flattened
  blis.h, and therefore it was being created first). This behavior can
  be confusing to application developers who could reasonably expect
  that the flattened cblas.h header would be available (to inspect or
  use) prior to running 'make install'.
- This commit fixes the aforementioned issue by (1) adding cblas.h (if
  CBLAS is enabled) as a dependency to all of the build rules for core
  framework object files, and (2) making the flattened blis.h a
  prerequisite for flattening cblas.h. The upshot is that (1) ensures
  that the flattened cblas.h is created around the the same time that
  the flattened blis.h is created, and (2) ensures that the two headers
  are flattened sequentially (first blis.h and then cblas.h) even when
  using 'make -j[n]', which ensures that the output of the two processes
  do not comingle.
- Thanks to Jeff Diamond for reporting this issue.
- (cherry picked from commit 8d9be87)

Fixed out-of-bounds read bug in sup haswell ukr. (#824)

Details:
- Fixed a bug in the bli_sgemmsup_rd_haswell_asm_1x16n() millikernel.
  The kernel was erroneously performing an out-of-bounds read whenever
  the singleton edge case loop executed (that is, whenever the k
  dimension of the millikernel problem was not a multiple of 8). This
  OOB error was the result of a copy-paste bug; when developing the
  s1x16n function, I started from a copy of the s2x16n function, but
  then failed to delete the instruction that reads the second element
  of A in the code that handles the PR loop's edge case. Thanks to
  @j-bm for reporting this bug in Issue #821 and helping narrow down
  the cause to the rax register.
- CREDITS file update.
- (cherry picked from commit a822cb2)

Fixed typo in 4158930; variable renames. (#815)

Details:
- Fixed a typo in the "./configure --help" output for the ScaLAPACK
  compatibility option implemented in 4158930.
- Trivial variable renames.
- (cherry picked from commit 8820f8f)

Fix a bug in the piledriver microkernels. (#814)

Details:
- At some point, the piledriver (and bulldozer and excavator)
  microkernel tests via SDE had been removed from Travis CI testing.
  This PR re-enables them.
- A bug in the piledriver complex gemm microkernels has also been
  fixed. The `beta*C` product was not being correctly added to the `A*B`
  product before writing back out to memory.
- Fixes #811.
- (cherry picked from commit 31ecf82)

Add ScaLAPACK compatibility mode. (#813)

Details:
- Add configure options '--enable-scalapack-compat' and
  '--disabled-scalapack-compat' (default disabled).
- Add a macro BLIS_{ENABLE,DISABLE}_SCALAPACK_COMPAT to bli_config.h.
- This option and macro control any changes to the API necessary to
  maintain compatibility with ScaLAPACK. Currently, this only means
  disabling the complex versions of syr, syr2, and symv. In the
  future, other changes could be controlled by the same flag.
- Complex syr2 wasn't enabled at the same time that complex syr and
  symv were. This is now corrected.
- (cherry picked from commit 4158930)

Update CREDITS

- (cherry picked from commit 5cbec65)

Fix SyntaxWarning messages from python 3.12 (#809)

Details:
- When using regexes in Python, certain characters need backslash
  escaping, e.g.:

    regex = re.compile( '^[\s]*#include (["<])([\w\.\-/]*)([">])' )

  However, technically escape sequences like `\s` are not valid and
  should actually be double-escaped: `\\s`. Python 3.12 now warns about
  such escape sequences, and in a later version these warning will be
  promoted to errors. See also:
  https://docs.python.org/dev/whatsnew/3.12.html#other-language-changes
  The fix here is to use Python's "raw strings" to avoid
  double-escaping. This issue can be checked for all files in the
  current directory with the command:

    python -m compileall -d . -f -q .

  Thanks to @AngryLoki for the fix.
- (cherry picked from commit 729c57c)

Updates to README.md section on downloading.

Details:
- Updated the text in README.md in the "How to Download BLIS" section.
  The new text no longer recommends that the reader use the 'master'
  branch over official releases, as the previous text did. The text was
  tweaked since (a) the 'master' branch is now akin to a development
  branch, and (b) the reader will no longer forgo bugfixes by sticking
  to official releases since we will (going forward) publish bugfix
  releases for the most recent version.
- (cherry picked from 6d0ab74)

Updated RELEASING file; fixes to ReleaseNotes.md.

Details:
- Updated RELEASING file to reflect new release protocols, given the
  more sophisticated policy of maintaining release candidate branches
  separate from 'master' (which is now more akin to a development
  branch). Further refinements to this file will likely follow.
- Fixed typos in ReleaseNotes.md. Thanks to Robert van de Geijn for
  reporting these.
- (cherry picked from 01e151a)

ReleaseNotes.md update.

- (cherry picked from 06dddf1)

CHANGELOG update (1.0)

- (cherry picked from a876918)

Version file update (1.0)

- (cherry picked from c2af113)

Added a script to help create new rc branches.

Details:
- Added a new script, build/start-new-rc.sh, which:
  1. Updates the version file with a new version string.
  2. Commits (locally) the version string update.
  3. Updates the CHANGELOG file with the output of 'git log'.
  4. Commits (locally) the CHANGLOG file update.
  5. Creates a new branch whose name is equal to "<vers>-rc0" where
     <vers> is the new version string.
  6. Reminds the user to execute some final steps if everything looks
     good.
  This new script will help in the future when it's time to start a new
  release candidate branch/lineage off of 'master'. Note that this
  script is based on build/bump-version.sh (which itself may change in
  the future due to changes in the way versions/releases will be handled
  going forward).
- (cherry picked from 5ab286f)
devinamatthews pushed a commit that referenced this pull request Nov 3, 2024
Details:
- Fixed a bug in the bli_sgemmsup_rd_haswell_asm_1x16n() millikernel.
  The kernel was erroneously performing an out-of-bounds read whenever
  the singleton edge case loop executed (that is, whenever the k
  dimension of the millikernel problem was not a multiple of 8). This
  OOB error was the result of a copy-paste bug; when developing the
  s1x16n function, I started from a copy of the s2x16n function, but
  then failed to delete the instruction that reads the second element
  of A in the code that handles the PR loop's edge case. Thanks to
  @j-bm for reporting this bug in Issue #821 and helping narrow down
  the cause to the rax register.
- CREDITS file update.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant