Skip to content

Disable failing FJ test on ARM CI#39

Merged
rapids-bot[bot] merged 2 commits intoNVIDIA:branch-25.05from
aliceb-nv:fj-ci-hotfix
May 27, 2025
Merged

Disable failing FJ test on ARM CI#39
rapids-bot[bot] merged 2 commits intoNVIDIA:branch-25.05from
aliceb-nv:fj-ci-hotfix

Conversation

@aliceb-nv
Copy link
Copy Markdown
Contributor

Temporary band-aid for https://github.com/rapidsai/cuopt/issues/2489, as the underlying issue is difficult to debug and possibly a CUB/cuSparse bug

Under high workloads, it appears calls to cusparse's CSR transpose algorithm return an off-by-one error in one of the indices. This only manifests in the ARM CI in the FJ tests on a specific instance. This PR disables the tests on this instance when building on ARM to work around this issue until the root bug is fixed.

@aliceb-nv aliceb-nv requested a review from a team as a code owner May 27, 2025 16:12
@aliceb-nv aliceb-nv added the bug Something isn't working label May 27, 2025
@aliceb-nv aliceb-nv requested a review from kaatish May 27, 2025 16:12
@aliceb-nv aliceb-nv added the non-breaking Introduces a non-breaking change label May 27, 2025
@aliceb-nv aliceb-nv requested a review from chris-maes May 27, 2025 16:12
@aliceb-nv
Copy link
Copy Markdown
Contributor Author

/ok to test

@aliceb-nv
Copy link
Copy Markdown
Contributor Author

/ok to test

@aliceb-nv
Copy link
Copy Markdown
Contributor Author

/merge

1 similar comment
@aliceb-nv
Copy link
Copy Markdown
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit dbf1ffe into NVIDIA:branch-25.05 May 27, 2025
56 checks passed
jieyibi pushed a commit to yining043/cuopt that referenced this pull request Mar 26, 2026
Temporary band-aid for rapidsai/cuopt#2489, as the underlying issue is difficult to debug and possibly a CUB/cuSparse bug

Under high workloads, it appears calls to cusparse's CSR transpose algorithm return an off-by-one error in one of the indices. This only manifests in the ARM CI in the FJ tests on a specific instance. This PR disables the tests on this instance when building on ARM to work around this issue until the root bug is fixed.

Authors:
  - Alice Boucher (https://github.com/aliceb-nv)

Approvers:
  - Rajesh Gandham (https://github.com/rg20)

URL: NVIDIA#39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants