-
-
Notifications
You must be signed in to change notification settings - Fork 17.6k
cudaPackages.nccl: refactor to fix #220340 and #221895 #217619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cudaPackages.nccl: refactor to fix #220340 and #221895 #217619
Conversation
cd68b53 to
27a9a13
Compare
27a9a13 to
10bc11f
Compare
10bc11f to
025b8f9
Compare
|
Waiting for #220402 to be merged prior to proceeding. |
025b8f9 to
f2b39d6
Compare
|
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/tweag-nix-dev-update-45/26397/1 |
f2b39d6 to
5dc256c
Compare
|
Result of 16 packages built:
|
|
CC @NixOS/cuda-maintainers |
samuela
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with what the runtime dependencies of nccl are, but these changes look reasonable to me. Perhaps @mdaiter or @orivej could comment?
As we move towards more of a "python-packages.nix" model, I think it might be worth collecting all cudaPackages.* packages into a single directory like python does. We have a bunch of random stuff spread out all over the codebase for no good reason except that that's just how things evolved over time. But we don't need to keep living like this. NCCL isn't really math software anyhow.
I like that idea! It would certainly make it easier to keep track of different components. How do you envision stuff like As a separate point, in the same way that we can get |
Good question! The line is a little fuzzy but I envision that the general policy would be that
I believe this may indeed be a tangential point :P Actually, I think we should focus on supporting fewer version combinations, not more. Having to maintain support for all combinations increases testing and maintenance burden. Users who are interested in using older versions can always use custom overlays or pin nixpkgs to previous commits. |
nccl does not have any runtime dependencies. The changes look good to me (although I'm not familiar with the concept of backendStdenv and don't understand why build dependencies have to be moved to nativeBuildInputs). |
|
Thanks @orivej, |
5dc256c to
d86b6a7
Compare
|
Rebased, moved |
d86b6a7 to
15d0b2f
Compare
|
I let it run overnight on my i9 13900K and it still didn't finish -- hopefully this time around it'll be faster :x |
15d0b2f to
cecff7f
Compare
|
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/tweag-nix-dev-update-46/26872/1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's it use which for again? Can we control it w/o which?
|
I don't remember where we stalled, but this LGTM. Nixpkgs-review and merge? |
I believe @ConnorBaker left a TODO item for himself here, but not sure what the status of that is? |
cecff7f to
f351800
Compare
|
Found out this was still open when I tried to update NCCL to 2.18.3. Let me try to pick this back up :) |
87bc76d to
3185b5e
Compare
|
Rebased, updated, and added more notes about what it's fixing. Testing with a super simple test suite built around nix build -L --override-input nixpkgs github:nixos/nixpkgs/pull/217619/head github:ConnorBaker/nix-cuda-test/3074628f4a0ece4928e032f9e2f2f1307b6ed22d#nccl-test-suiteSeemed to work for me, though it's just one device. https://gist.github.com/ConnorBaker/ea3c49e23c1eaf2544fe97ae6fdd67a5 |
3185b5e to
00296a3
Compare
The previous usage of `cudaPackages` ensured that we only ever saw packages from the default version of `cudaPackages` Nixpkgs uses.
00296a3 to
8208602
Compare
Description of changes
autoAddOpenGLRunpathHookinstead of manual invocation ofaddOpenGLRunpathncclwas only built with default version ofcudaPackagesby changing thencclderivation to accept the actual CUDA packages it requires as argumentscudaPackages#221895cudaPackagesin Nixpkgscuda_cudartto prevent breakage when cudaPackages: multiple outputs for redistributables #240498 is mergedremoved unnecessarywhichentry innativeBuildInputswhichis unused, it's used to locate NVCC and extract the CUDA version. This is in turn used for conditional compilation.Things done
sandbox = trueset innix.conf? (See Nix manual)nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/)