Skip to content

Conversation

@zhou-pz
Copy link
Contributor

@zhou-pz zhou-pz commented Dec 31, 2025

This pull request introduces two new modules: a general-purpose merge operation in the hal kernel and a Brent-Kung based aggregation (Logstar) in the hlo kernel. Both modules include their implementations, headers, Bazel build rules, and tests. The merge operation supports custom comparators and payloads, while the Logstar aggregation provides efficient, vectorized aggregation routines. Note: the Brent-Kung based aggregation (Logstar) is not yet optimized!

New merge operations in HAL:

  • Added merge and merge_with_payloads functions to hal, enabling general-purpose merging of tensors along a specified dimension with support for custom comparators and optional payloads. [1] [2]
  • Declared and exposed merge1d and merge1d_with_payloads in permute.h for 1D merging with comparator support.
  • Added Bazel targets merge and merge_test in src/libspu/kernel/hal/BUILD.bazel for the new functionality and its tests.

New Logstar aggregation in HLO:

  • Implemented AggregateBrentKung and related vectorized aggregation routines in Logstar.cc, providing efficient parallel aggregation (Brent-Kung scan) with and without valid bits.
  • Declared AggregateBrentKung, AggregateBrentKung_without_valids, and utility functions in Logstar.h.
  • Added Bazel targets Logstar and Logstar_test in src/libspu/kernel/hlo/BUILD.bazel for the new Logstar aggregation and its tests.# Pull Request

What problem does this PR solve?

Issue Number: Fixed #

Possible side effects?

  • Performance:

  • Backward compatibility:

Copilot AI review requested due to automatic review settings December 31, 2025 07:34
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @zhou-pz, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the SPU kernel by introducing advanced secure computation primitives. It provides new general-purpose merge operations within the hal kernel, allowing for flexible merging of tensors with custom comparison logic and payload handling. Complementing this, a Brent-Kung based aggregation (Logstar) is integrated into the hlo kernel, offering efficient vectorized routines for parallel prefix-sum-like operations, including support for validity masks. These additions expand the SPU's capabilities for complex data manipulation in a secure multi-party computation environment.

Highlights

  • HAL Merge Operations: Introduced merge and merge_with_payloads in the hal kernel for general-purpose tensor merging along a specified dimension, supporting custom comparators and optional payloads.
  • Odd-Even Merge Network Implementation: Implemented odd_even_merge and odd_even_merge_with_payloads in hal/permute.cc using an iterative approach to generate comparison layers, forming the core of the new merge functionality.
  • HAL 1D Merge Functions: Exposed merge1d and merge1d_with_payloads in hal/permute.h and implemented them in hal/permute.cc as wrappers around the odd-even merge networks for 1D merging.
  • HLO Logstar Aggregation: Added AggregateBrentKung and related vectorized aggregation routines to the hlo kernel, enabling efficient parallel aggregation (Brent-Kung scan) with and without valid bits.
  • Comprehensive Testing: New test files (merge_test.cc, Logstar_test.cc) were added to validate the correctness and performance of the new merge and aggregation functionalities, including large-scale scenarios.
  • Build System Integration: Updated Bazel build rules (hal/BUILD.bazel, hlo/BUILD.bazel) to include the new merge and Logstar libraries and their respective tests.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a general-purpose merge operation in the HAL kernel and a Brent-Kung based aggregation (Logstar) in the HLO kernel, along with their respective build rules and tests. The changes are well-structured. My review focuses on improving code clarity by removing commented-out code, translating comments to English for consistency, and refining error messages. I've also included a suggestion for future performance optimization in the Brent-Kung implementation, which aligns with the note in the pull request description.

Comment on lines 33 to 34
SPU_ENFORCE_EQ(inputs.size(), payloads.size(),
"Number of input tensors and valid tensors must match");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error message refers to "valid tensors", but the parameter is named payloads. To avoid confusion, it would be clearer to use "payload tensors" in the message.

Suggested change
SPU_ENFORCE_EQ(inputs.size(), payloads.size(),
"Number of input tensors and valid tensors must match");
SPU_ENFORCE_EQ(inputs.size(), payloads.size(),
"Number of input tensors and payload tensors must match");


ret = internal::odd_even_merge(ctx, cmp, inputs, split_idx);
} else {
SPU_THROW("Should not reach here");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using SPU_THROW here for a condition that should not be reached makes it harder to debug. It's better to use SPU_ENFORCE with a descriptive message to clearly state the violated assumption. For example, you could enforce that comparator_ret_vis must be VIS_SECRET.

Suggested change
SPU_THROW("Should not reach here");
SPU_ENFORCE(comparator_ret_vis == VIS_SECRET, "merge1d currently only supports secret comparator return visibility");

Comment on lines 2001 to 2004
SPU_ENFORCE(
inputs.size() == 2,
"merge1d_with_payloads expects exactly 2 inputs (Value tensor and "
"Valid tensor)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error message refers to "Valid tensor", but the function is generic and works with any payload. To make it more general and less confusing, please consider changing "Valid tensor" to "Payload tensor".

Suggested change
SPU_ENFORCE(
inputs.size() == 2,
"merge1d_with_payloads expects exactly 2 inputs (Value tensor and "
"Valid tensor)");
SPU_ENFORCE(
inputs.size() == 2,
"merge1d_with_payloads expects exactly 2 inputs (Value tensor and "
"Payload tensor)");


namespace spu::kernel::hlo {

// Vectorized NoteFunc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are several comments in Chinese within this file (e.g., lines 28, 36, 51, 63, 75, 83, 195). Please translate them to English to maintain consistency across the codebase.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements an odd-even merge network in the HAL kernel and a Brent-Kung based aggregation (Logstar) in the HLO kernel. Both modules support secure multi-party computation with custom comparators and optional payloads, enabling efficient parallel merging and aggregation operations.

Key Changes:

  • Added general-purpose merge operations (merge, merge_with_payloads) to the HAL kernel with support for custom comparators and payloads
  • Implemented Brent-Kung based aggregation (AggregateBrentKung) in the HLO kernel for efficient vectorized parallel prefix operations
  • Introduced odd-even merge network topology generator using iterative stack-based approach to avoid recursion

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 37 comments.

Show a summary per file
File Description
src/libspu/kernel/hal/merge.h Declares public API for merge operations
src/libspu/kernel/hal/merge.cc Implements merge and merge_with_payloads by concatenating inputs and delegating to 1D merge functions
src/libspu/kernel/hal/permute.h Adds declarations for merge1d and merge1d_with_payloads functions
src/libspu/kernel/hal/permute.cc Implements odd-even merge network topology generation and merge operations
src/libspu/kernel/hal/merge_test.cc Provides comprehensive test coverage for merge operations including basic correctness and large-scale tests
src/libspu/kernel/hal/BUILD.bazel Adds Bazel build targets for merge module and tests
src/libspu/kernel/hlo/Logstar.h Declares public API for Brent-Kung aggregation operations
src/libspu/kernel/hlo/Logstar.cc Implements vectorized Brent-Kung parallel prefix scan with up-sweep and down-sweep phases
src/libspu/kernel/hlo/Logstar_test.cc Provides test coverage for Brent-Kung aggregation with basic correctness and large-scale benchmarks
src/libspu/kernel/hlo/BUILD.bazel Adds Bazel build targets for Logstar module and tests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

stack.push_back(std::move(odd_frame));

} else if (frame.phase == 2) {
// odd 完成,开始 even
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment contains Chinese characters which should be replaced with English for consistency and accessibility. The comment translates to "odd completed, start even".

Copilot uses AI. Check for mistakes.
Comment on lines 29 to 30
// 输入的 p1, p2 维度为 [Batch, BlockSize]
// 输入的 g1, g2 维度为 [Batch, 1]
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment contains Chinese characters which should be replaced with English for consistency and accessibility. The comment translates to "inputs are p1, p2 dimensions are [Batch, BlockSize]; inputs g1, g2 dimensions are [Batch, 1]".

Copilot uses AI. Check for mistakes.
n |= n >> 32;
return n + 1;
}
// // 最优:非 padding 方案,额外开销低
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment contains Chinese characters which should be replaced with English for consistency and accessibility. The comment translates to "Optimal: non-padding scheme, low extra overhead".

Copilot uses AI. Check for mistakes.
Comment on lines 683 to 688
// 定义比较器包装函数:
//
// _cmp_swap 会 gather 出:
// gathered_inputs = [Value_L, Value_R, Payload_L, Payload_R]
//
// 根据 Values 进行比较,取前两个元素传给原始比较器
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment contains Chinese characters which should be replaced with English for consistency and accessibility. The comment translates to "Define comparator wrapper function: _cmp_swap will gather: gathered_inputs = [Value_L, Value_R, Payload_L, Payload_R]; compare based on Values, pass first two elements to original comparator".

Suggested change
// 定义比较器包装函数:
//
// _cmp_swap gather 出:
// gathered_inputs = [Value_L, Value_R, Payload_L, Payload_R]
//
// 根据 Values 进行比较,取前两个元素传给原始比较器
// Define comparator wrapper function:
//
// _cmp_swap will gather:
// gathered_inputs = [Value_L, Value_R, Payload_L, Payload_R]
//
// Compare based on Values, and pass the first two elements to the original comparator.

Copilot uses AI. Check for mistakes.
});
}

TEST_F(BrentKungTest, LargeScaleIntergers) {
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test name "LargeScaleIntergers" contains a typo. It should be "Integers" instead of "Intergers".

Suggested change
TEST_F(BrentKungTest, LargeScaleIntergers) {
TEST_F(BrentKungTest, LargeScaleIntegers) {

Copilot uses AI. Check for mistakes.
auto g3 = hal::mul(ctx, g1, g2);
// 2. diff = p2 - p1
auto diff = hal::sub(ctx, p2, p1);
// 3. 广播: g1 是 [Batch, 1], diff 是 [Batch, BlockSize]
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment contains Chinese characters which should be replaced with English for consistency and accessibility. The comment translates to "Broadcast: g1 is [Batch, 1], diff is [Batch, BlockSize]".

Copilot uses AI. Check for mistakes.
const int64_t block_size = 2;
std::mt19937 rng(std::random_device{}());
std::uniform_real_distribution<float> dist_x(0, 1000);
std::uniform_int_distribution<int> dist_binary(0, 1); // 改为 int 类型
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment contains Chinese characters which should be replaced with English for consistency and accessibility. The comment translates to "changed to int type".

Suggested change
std::uniform_int_distribution<int> dist_binary(0, 1); // 改为 int 类型
std::uniform_int_distribution<int> dist_binary(0, 1); // changed to int type

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant