Skip to content

Conversation

@xlliu-scitix
Copy link

No description provided.

zqan9 and others added 28 commits November 14, 2024 11:30
* add '/usr/local/sihpc/lib' to rpath.

* print 'NVCC_GENCODE' in Makefile, and by default generate bin for
  Volta, Ampere, Ada, and Hopper.

* add test run wrapper scripts "nccl_perf" and "nccl_test".
* do check for AllReduce only.

* disable option '-t', thus nThreads = 1 always.

* message size, min+max bytes, timeouts, etc. are fed automatically.

* support checking results when running in comm split mode.

other changes:

* try to get physical hostname via env 'NODE_NAME'.

* check ib port state and print a log if not up nor active.

* default stepFactor is changed to '2', datatype is changed to 'bf16'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants