Skip to content

Conversation

@valarLip
Copy link
Collaborator

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Copilot AI review requested due to automatic review settings October 26, 2025 13:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the custom all-reduce API by consolidating separate "registered" and "unregistered" variants into a unified interface. The main changes include renaming full_nvlink to fully_connected for better clarity and merging all_reduce_reg and all_reduce_unreg into a single all_reduce function that accepts an optional registration buffer parameter.

Key changes:

  • Unified all-reduce API with optional registration buffer parameter
  • Renamed parameter from full_nvlink to fully_connected for improved clarity
  • Import statement reorganization for consistent code style

Reviewed Changes

Copilot reviewed 9 out of 15 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
op_tests/multigpu_tests/test_custom_allreduce.py Reorganized imports to follow standard conventions (standard library, third-party, local imports)
csrc/kernels/custom_all_reduce.cu Merged all_reduce_reg and all_reduce_unreg into single all_reduce function with optional reg_buffer parameter; renamed full_nvlink to fully_connected
csrc/include/rocm_ops.hpp Updated Python bindings to reflect unified all_reduce API and renamed parameter
csrc/include/custom_all_reduce.h Updated function signatures for unified API and improved formatting
csrc/include/custom_all_reduce.cuh Renamed constructor parameter from full_nvlink to fully_connected
aiter/ops/sample.py Reorganized imports and renamed temperatures parameter to temperature (singular)
aiter/ops/custom_all_reduce.py Updated Python interface to match unified C++ API
aiter/ops/communication.py Updated references to access custom all-reduce communicator through device_communicator
aiter/jit/core.py Removed unused operation from export list

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@junhaha666 junhaha666 merged commit 22c847a into main Oct 27, 2025
16 checks passed
@junhaha666 junhaha666 deleted the refine_ca branch October 27, 2025 05:56
ganyi1996ppo pushed a commit that referenced this pull request Nov 19, 2025
* refine CA

* fix

* fix type hint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants