Skip to content

Segfault while running benchmarks #363

@access2rohit

Description

@access2rohit

Steps to reproduce:

Install NCCL

git clone https://github.com/NVIDIA/nccl.git
git checkout v2.28.9-1
git checkout -b v2.28.9-1
$ make -j src.build NVCC_GENCODE="-gencode=arch=compute_90,code=sm_90"

Install nccl-tests

$ git clone https://github.com/NVIDIA/nccl-tests.git
$ cd nccl-tests
$ NCCL_HOME=/fsx/ubuntu/nccl/build/ make

Run Tests

$ LD_LIBRARY_PATH=/fsx/ubuntu/nccl/build/lib:$LD_LIBRARY_PATH
$ ./build/all_reduce_perf -b 8 -e 256M -f 2 -g 8
# nccl-tests version 2.17.6 nccl-headers=22809 nccl-library=22809
# Collective test starting: all_reduce_perf
# nThread 1 nGpus 8 minBytes 8 maxBytes 268435456 step: 2(factor) warmup iters: 1 iters: 20 agg iters: 1 validation: 1 graph: 0
#
# Using devices
#  Rank  0 Group  0 Pid 1476972 on ip-172-31-59-56 device  0 [0000:59:00] NVIDIA H200
#  Rank  1 Group  0 Pid 1476972 on ip-172-31-59-56 device  1 [0000:5a:00] NVIDIA H200
#  Rank  2 Group  0 Pid 1476972 on ip-172-31-59-56 device  2 [0000:72:00] NVIDIA H200
#  Rank  3 Group  0 Pid 1476972 on ip-172-31-59-56 device  3 [0000:73:00] NVIDIA H200
#  Rank  4 Group  0 Pid 1476972 on ip-172-31-59-56 device  4 [0000:8b:00] NVIDIA H200
#  Rank  5 Group  0 Pid 1476972 on ip-172-31-59-56 device  5 [0000:8c:00] NVIDIA H200
#  Rank  6 Group  0 Pid 1476972 on ip-172-31-59-56 device  6 [0000:a4:00] NVIDIA H200
#  Rank  7 Group  0 Pid 1476972 on ip-172-31-59-56 device  7 [0000:a5:00] NVIDIA H200
Segmentation fault (core dumped)

output: nvidia-smi

Fri Dec 12 20:24:33 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H200                    On  |   00000000:59:00.0 Off |                    0 |
| N/A   28C    P0             75W /  700W |       4MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA H200                    On  |   00000000:5A:00.0 Off |                    0 |
| N/A   26C    P0             75W /  700W |       4MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA H200                    On  |   00000000:72:00.0 Off |                    0 |
| N/A   28C    P0             75W /  700W |       4MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA H200                    On  |   00000000:73:00.0 Off |                    0 |
| N/A   26C    P0             77W /  700W |       4MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA H200                    On  |   00000000:8B:00.0 Off |                    0 |
| N/A   29C    P0             77W /  700W |       4MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA H200                    On  |   00000000:8C:00.0 Off |                    0 |
| N/A   27C    P0             79W /  700W |       4MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA H200                    On  |   00000000:A4:00.0 Off |                    0 |
| N/A   28C    P0             76W /  700W |       4MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA H200                    On  |   00000000:A5:00.0 Off |                    0 |
| N/A   26C    P0             75W /  700W |       4MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

NVCC info

$ nvcc --version                                                                                                                
nvcc: NVIDIA (R) Cuda compiler driver                                                                                                                                                               
Copyright (c) 2005-2025 NVIDIA Corporation                                                                                                                                                          
Built on Fri_Feb_21_20:23:50_PST_2025                                                                                                                                                               
Cuda compilation tools, release 12.8, V12.8.93                                                                                                                                                      
Build cuda_12.8.r12.8/compiler.35583870_0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions