Skip to content

Conversation

@Rhai2307
Copy link

Description

This MR enhances dumpProxyState to improve visibility, correctness, and debug value in multi-group and collective communication scenarios.

  1. Fix dumpProxyState visibility issues with multiple communication groups.
  2. Add support for collective (Coll) operations in proxy state dumps.
  3. Improve proxy state dump with more detailed information.

Related Issues

#1930

Changes & Impact

  1. Replace process-global static variable ncclLastProxyState with thread-local static variable.
  2. Determine whether a flow is send or recv based on sub->connection->send rather than op->pattern, enabling to dump proxy state for collective communications.
  3. Print more detailed chunk information.

How To Use

  1. Set environment variables
export NCCL_PROXY_DUMP_SIGNAL=10
export NCCL_SET_THREAD_NAME=1
  1. Find the proxy thread ID (TID)
ps -eL | grep "NCCL Progress"
  1. Send the signal to trigger the dump and view the output
kill -10 <tid>

Example Output

ACTIVE OPS
[0-0|0| Reduce | recv channel NET/00: 0 <- 2 / status R (nsteps 6144, posted 677, received 669, transmitted 669, done 669)]
| `-> [0-2|2| AllGather | recv channel NET/00]
| `-> [0-6|4| ReduceScatter | recv channel NET/00]
v
[0-1|0| Reduce | recv channel NET/02: 0 <- 2 / status R (nsteps 6144, posted 676, received 668, transmitted 668, done 668)]
| `-> [0-4|2| AllGather | recv channel NET/02]
| `-> [0-8|4| ReduceScatter | recv channel NET/02]
v
[0-3|2| AllGather | send channel NET/01: 0 -> 2 / status G (nsteps 12, posted 8, transmitted 0, done 0)]
| `-> [0-7|4| ReduceScatter | send channel NET/01]
v
[0-5|2| AllGather | send channel NET/03: 0 -> 2 / status G (nsteps 12, posted 8, transmitted 0, done 0)]
| `-> [0-9|4| ReduceScatter | send channel NET/03]
v
[X]

Performance Impact

No impact on execution performance; changes affect only proxy state dumping for debugging purposes.

@Rhai2307
Copy link
Author

Rhai2307 commented Dec 18, 2025

I noticed that #1942 has also optimized the dumpProxyState function. But unfortunately, I don’t think it addresses all of my concerns raised in #1930. Therefore, I’ve submitted this commit as well.

# Conflicts:
#	src/proxy.cc

Co-authored-by: glaxy <glaxy@glaxy>
Co-authored-by: glaxy <glaxy@glaxy>
@Rhai2307
Copy link
Author

Rhai2307 commented Dec 19, 2025

To make it easier to identify the communication group, I added the commHash to the dump function.

Example Output

[0x6dd16f1f60ed8e7a|0-0|0| Reduce | recv channel NET/00: 0 <- 2 / status R (nsteps 6144, posted 210, received 202, transmitted 202, done 202)]
| `-> [0x6dd16f1f60ed8e7a|0-2|2| AllGather | recv channel NET/00]
| `-> [0x6dd16f1f60ed8e7a|0-6|4| ReduceScatter | recv channel NET/00]
v
[0x6dd16f1f60ed8e7a|0-1|0| Reduce | recv channel NET/02: 0 <- 2 / status R (nsteps 6144, posted 210, received 202, transmitted 202, done 202)]
| `-> [0x6dd16f1f60ed8e7a|0-4|2| AllGather | recv channel NET/02]
| `-> [0x6dd16f1f60ed8e7a|0-8|4| ReduceScatter | recv channel NET/02]
v
[0x6dd16f1f60ed8e7a|0-3|2| AllGather | send channel NET/01: 0 -> 2 / status G (nsteps 12, posted 8, transmitted 0, done 0)]
| `-> [0x6dd16f1f60ed8e7a|0-7|4| ReduceScatter | send channel NET/01]
v
[0x6dd16f1f60ed8e7a|0-5|2| AllGather | send channel NET/03: 0 -> 2 / status G (nsteps 12, posted 8, transmitted 0, done 0)]
| `-> [0x6dd16f1f60ed8e7a|0-9|4| ReduceScatter | send channel NET/03]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant