-
Notifications
You must be signed in to change notification settings - Fork 338
Open
Description
Steps to reproduce:
Install NCCL
git clone https://github.com/NVIDIA/nccl.git
git checkout v2.28.9-1
git checkout -b v2.28.9-1
$ make -j src.build NVCC_GENCODE="-gencode=arch=compute_90,code=sm_90"
Install nccl-tests
$ git clone https://github.com/NVIDIA/nccl-tests.git
$ cd nccl-tests
$ NCCL_HOME=/fsx/ubuntu/nccl/build/ make
Run Tests
$ LD_LIBRARY_PATH=/fsx/ubuntu/nccl/build/lib:$LD_LIBRARY_PATH
$ ./build/all_reduce_perf -b 8 -e 256M -f 2 -g 8
# nccl-tests version 2.17.6 nccl-headers=22809 nccl-library=22809
# Collective test starting: all_reduce_perf
# nThread 1 nGpus 8 minBytes 8 maxBytes 268435456 step: 2(factor) warmup iters: 1 iters: 20 agg iters: 1 validation: 1 graph: 0
#
# Using devices
# Rank 0 Group 0 Pid 1476972 on ip-172-31-59-56 device 0 [0000:59:00] NVIDIA H200
# Rank 1 Group 0 Pid 1476972 on ip-172-31-59-56 device 1 [0000:5a:00] NVIDIA H200
# Rank 2 Group 0 Pid 1476972 on ip-172-31-59-56 device 2 [0000:72:00] NVIDIA H200
# Rank 3 Group 0 Pid 1476972 on ip-172-31-59-56 device 3 [0000:73:00] NVIDIA H200
# Rank 4 Group 0 Pid 1476972 on ip-172-31-59-56 device 4 [0000:8b:00] NVIDIA H200
# Rank 5 Group 0 Pid 1476972 on ip-172-31-59-56 device 5 [0000:8c:00] NVIDIA H200
# Rank 6 Group 0 Pid 1476972 on ip-172-31-59-56 device 6 [0000:a4:00] NVIDIA H200
# Rank 7 Group 0 Pid 1476972 on ip-172-31-59-56 device 7 [0000:a5:00] NVIDIA H200
Segmentation fault (core dumped)
output: nvidia-smi
Fri Dec 12 20:24:33 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H200 On | 00000000:59:00.0 Off | 0 |
| N/A 28C P0 75W / 700W | 4MiB / 143771MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA H200 On | 00000000:5A:00.0 Off | 0 |
| N/A 26C P0 75W / 700W | 4MiB / 143771MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA H200 On | 00000000:72:00.0 Off | 0 |
| N/A 28C P0 75W / 700W | 4MiB / 143771MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA H200 On | 00000000:73:00.0 Off | 0 |
| N/A 26C P0 77W / 700W | 4MiB / 143771MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA H200 On | 00000000:8B:00.0 Off | 0 |
| N/A 29C P0 77W / 700W | 4MiB / 143771MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA H200 On | 00000000:8C:00.0 Off | 0 |
| N/A 27C P0 79W / 700W | 4MiB / 143771MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA H200 On | 00000000:A4:00.0 Off | 0 |
| N/A 28C P0 76W / 700W | 4MiB / 143771MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA H200 On | 00000000:A5:00.0 Off | 0 |
| N/A 26C P0 75W / 700W | 4MiB / 143771MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
NVCC info
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0
Metadata
Metadata
Assignees
Labels
No labels