Skip to content

Revert "Temporarily skip CUDA 11 wheel CI"#601

Merged
rapids-bot[bot] merged 1 commit intobranch-25.02from
revert-599-skip-cuda-11-wheel-ci
Jan 22, 2025
Merged

Revert "Temporarily skip CUDA 11 wheel CI"#601
rapids-bot[bot] merged 1 commit intobranch-25.02from
revert-599-skip-cuda-11-wheel-ci

Conversation

@bdice
Copy link
Copy Markdown
Contributor

@bdice bdice commented Jan 22, 2025

Reverts #599 now that rapidsai/raft#2548 has landed.

@bdice bdice requested a review from a team as a code owner January 22, 2025 11:34
@bdice bdice requested a review from jameslamb January 22, 2025 11:34
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.30%. Comparing base (9b7bb97) to head (0702f89).

Additional details and impacted files
@@              Coverage Diff              @@
##           branch-25.02     #601   +/-   ##
=============================================
  Coverage         72.30%   72.30%           
=============================================
  Files                14       14           
  Lines                65       65           
=============================================
  Hits                 47       47           
  Misses               18       18           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jameslamb jameslamb added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Jan 22, 2025
@jameslamb
Copy link
Copy Markdown
Member

good news: the wheel tests that had been failing because of the cuBLAS issues are passing!

bad news: 1 wheel test

=========================== short test summary info ============================
FAILED python/cuvs/cuvs/test/test_distance.py::test_distance[float16-F-True-euclidean-50-100] - assert False
 +  where False = <function allclose at 0xfffee65add70>(array([[0.        , 2.94198351, 2.11872091, ..., 2.73895706, 2.80186958,\n        2.62724569],\n       [2.94198351, 0.  ...272 ],\n       [2.62724569, 2.84470779, 2.48090272, ..., 2.65241563, 2.7694272 ,\n        0.        ]], shape=(100, 100)), array([[0.       , 2.939494 , 2.1176343, ..., 2.738613 , 2.8034577,\n        2.625    ],\n       [2.939494 , 0.       , ...     [2.625    , 2.8449516, 2.4811792, ..., 2.6516504, 2.769815 ,\n        0.       ]], shape=(100, 100), dtype=float32), atol=0.1, rtol=0.1)
 +    where <function allclose at 0xfffee65add70> = np.allclose
====== 1 failed, 1917 passed, 116 skipped, 2 xfailed in 105.19s (0:01:45) ======

(build link)

That looks like a numerical-precision thing (which can sometimes show up as a flaky test), but I observed it on consecutive runs.

@bdice
Copy link
Copy Markdown
Contributor Author

bdice commented Jan 22, 2025

#596 looks like it could be related to the precision error. @rhdong Can you confirm if your PR is expected to fix this failure?

@rhdong
Copy link
Copy Markdown
Member

rhdong commented Jan 22, 2025

good news: the wheel tests that had been failing because of the cuBLAS issues are passing!

bad news: 1 wheel test

=========================== short test summary info ============================
FAILED python/cuvs/cuvs/test/test_distance.py::test_distance[float16-F-True-euclidean-50-100] - assert False
 +  where False = <function allclose at 0xfffee65add70>(array([[0.        , 2.94198351, 2.11872091, ..., 2.73895706, 2.80186958,\n        2.62724569],\n       [2.94198351, 0.  ...272 ],\n       [2.62724569, 2.84470779, 2.48090272, ..., 2.65241563, 2.7694272 ,\n        0.        ]], shape=(100, 100)), array([[0.       , 2.939494 , 2.1176343, ..., 2.738613 , 2.8034577,\n        2.625    ],\n       [2.939494 , 0.       , ...     [2.625    , 2.8449516, 2.4811792, ..., 2.6516504, 2.769815 ,\n        0.       ]], shape=(100, 100), dtype=float32), atol=0.1, rtol=0.1)
 +    where <function allclose at 0xfffee65add70> = np.allclose
====== 1 failed, 1917 passed, 116 skipped, 2 xfailed in 105.19s (0:01:45) ======

(build link)

That looks like a numerical-precision thing (which can sometimes show up as a flaky test), but I observed it on consecutive runs.

Hi @jameslamb , this PR will resolve the issue, pls rerun your tests to ignore it temporarily.

@vyasr
Copy link
Copy Markdown
Contributor

vyasr commented Jan 22, 2025

How many times should we try a rerun? Looks like it's failed three times now.

@cjnolet
Copy link
Copy Markdown
Member

cjnolet commented Jan 22, 2025

@vyasr @jameslamb cuVS CI started failing when the script to run the python tests was fixed. I’m not sure which tests were/weren’t running prior to that because I verified myself that there were Python tests running in CI prior to that fix. However, i suspect these tests hadn’t been running since October timeframe and that’s why we are now seeing failures.

One failure seems related to CUBLAS, another seems related to precision or a bug in a distance function/computation.

@jameslamb
Copy link
Copy Markdown
Member

Oh wow! Thanks for that context.

One failure seems related to CUBLAS,

Take a look at "cuVS CI failures" in rapidsai/build-planning#137. If what you're referring to is the same as those logs, then that issue is now fixed.

another seems related to precision or a bug in a distance function/computation

Ok yep, that's the one we're running into here, I think: #601 (comment)

@rhdong
Copy link
Copy Markdown
Member

rhdong commented Jan 22, 2025

How many times should we try a rerun? Looks like it's failed three times now.

Well... it looks like getting consecutive Aces in a poker game.. I just rerun it, let's see and the #596 is close to pass all CI tests, at least we can count on merge it in advance..

@vyasr
Copy link
Copy Markdown
Contributor

vyasr commented Jan 22, 2025

Ha yes at this point I think we'll probably wind up waiting for #596 to finish CI, but since the wheel tests are fast no harm in attempting a rerun and seeing what happens.

@bdice
Copy link
Copy Markdown
Contributor Author

bdice commented Jan 22, 2025

/merge

@rapids-bot rapids-bot bot merged commit 43969ca into branch-25.02 Jan 22, 2025
@bdice
Copy link
Copy Markdown
Contributor Author

bdice commented Jan 22, 2025

The last rerun worked!

@jameslamb jameslamb deleted the revert-599-skip-cuda-11-wheel-ci branch January 22, 2025 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants