Implement odd-even-merge network #1370

zhou-pz · 2025-12-31T07:34:53Z

This pull request introduces two new modules: a general-purpose merge operation in the hal kernel and a Brent-Kung based aggregation (Logstar) in the hlo kernel. Both modules include their implementations, headers, Bazel build rules, and tests. The merge operation supports custom comparators and payloads, while the Logstar aggregation provides efficient, vectorized aggregation routines. Note: the Brent-Kung based aggregation (Logstar) is not yet optimized!

New merge operations in HAL:

Added merge and merge_with_payloads functions to hal, enabling general-purpose merging of tensors along a specified dimension with support for custom comparators and optional payloads. [1] [2]
Declared and exposed merge1d and merge1d_with_payloads in permute.h for 1D merging with comparator support.
Added Bazel targets merge and merge_test in src/libspu/kernel/hal/BUILD.bazel for the new functionality and its tests.

New Logstar aggregation in HLO:

Implemented AggregateBrentKung and related vectorized aggregation routines in Logstar.cc, providing efficient parallel aggregation (Brent-Kung scan) with and without valid bits.
Declared AggregateBrentKung, AggregateBrentKung_without_valids, and utility functions in Logstar.h.
Added Bazel targets Logstar and Logstar_test in src/libspu/kernel/hlo/BUILD.bazel for the new Logstar aggregation and its tests.# Pull Request

What problem does this PR solve?

Issue Number: Fixed #

Possible side effects?

Performance:
Backward compatibility:

gemini-code-assist · 2025-12-31T07:35:16Z

Summary of Changes

Hello @zhou-pz, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the SPU kernel by introducing advanced secure computation primitives. It provides new general-purpose merge operations within the hal kernel, allowing for flexible merging of tensors with custom comparison logic and payload handling. Complementing this, a Brent-Kung based aggregation (Logstar) is integrated into the hlo kernel, offering efficient vectorized routines for parallel prefix-sum-like operations, including support for validity masks. These additions expand the SPU's capabilities for complex data manipulation in a secure multi-party computation environment.

Highlights

HAL Merge Operations: Introduced merge and merge_with_payloads in the hal kernel for general-purpose tensor merging along a specified dimension, supporting custom comparators and optional payloads.
Odd-Even Merge Network Implementation: Implemented odd_even_merge and odd_even_merge_with_payloads in hal/permute.cc using an iterative approach to generate comparison layers, forming the core of the new merge functionality.
HAL 1D Merge Functions: Exposed merge1d and merge1d_with_payloads in hal/permute.h and implemented them in hal/permute.cc as wrappers around the odd-even merge networks for 1D merging.
HLO Logstar Aggregation: Added AggregateBrentKung and related vectorized aggregation routines to the hlo kernel, enabling efficient parallel aggregation (Brent-Kung scan) with and without valid bits.
Comprehensive Testing: New test files (merge_test.cc, Logstar_test.cc) were added to validate the correctness and performance of the new merge and aggregation functionalities, including large-scale scenarios.
Build System Integration: Updated Bazel build rules (hal/BUILD.bazel, hlo/BUILD.bazel) to include the new merge and Logstar libraries and their respective tests.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a general-purpose merge operation in the HAL kernel and a Brent-Kung based aggregation (Logstar) in the HLO kernel, along with their respective build rules and tests. The changes are well-structured. My review focuses on improving code clarity by removing commented-out code, translating comments to English for consistency, and refining error messages. I've also included a suggestion for future performance optimization in the Brent-Kung implementation, which aligns with the note in the pull request description.

src/libspu/kernel/hal/merge.cc

gemini-code-assist · 2025-12-31T07:36:54Z

src/libspu/kernel/hal/merge.cc

+  SPU_ENFORCE_EQ(inputs.size(), payloads.size(),
+                 "Number of input tensors and valid tensors must match");


The error message refers to "valid tensors", but the parameter is named payloads. To avoid confusion, it would be clearer to use "payload tensors" in the message.

Suggested change

SPU_ENFORCE_EQ(inputs.size(), payloads.size(),

"Number of input tensors and valid tensors must match");

SPU_ENFORCE_EQ(inputs.size(), payloads.size(),

"Number of input tensors and payload tensors must match");

src/libspu/kernel/hal/permute.cc

gemini-code-assist · 2025-12-31T07:36:54Z

src/libspu/kernel/hal/permute.cc

+
+    ret = internal::odd_even_merge(ctx, cmp, inputs, split_idx);
+  } else {
+    SPU_THROW("Should not reach here");


Using SPU_THROW here for a condition that should not be reached makes it harder to debug. It's better to use SPU_ENFORCE with a descriptive message to clearly state the violated assumption. For example, you could enforce that comparator_ret_vis must be VIS_SECRET.

Suggested change

SPU_THROW("Should not reach here");

SPU_ENFORCE(comparator_ret_vis == VIS_SECRET, "merge1d currently only supports secret comparator return visibility");

gemini-code-assist · 2025-12-31T07:36:54Z

src/libspu/kernel/hal/permute.cc

+  SPU_ENFORCE(
+      inputs.size() == 2,
+      "merge1d_with_payloads expects exactly 2 inputs (Value tensor and "
+      "Valid tensor)");


The error message refers to "Valid tensor", but the function is generic and works with any payload. To make it more general and less confusing, please consider changing "Valid tensor" to "Payload tensor".

Suggested change

SPU_ENFORCE(

inputs.size() == 2,

"merge1d_with_payloads expects exactly 2 inputs (Value tensor and "

"Valid tensor)");

SPU_ENFORCE(

inputs.size() == 2,

"merge1d_with_payloads expects exactly 2 inputs (Value tensor and "

"Payload tensor)");

gemini-code-assist · 2025-12-31T07:36:54Z

src/libspu/kernel/hlo/Logstar.cc

+
+namespace spu::kernel::hlo {
+
+//  Vectorized NoteFunc


There are several comments in Chinese within this file (e.g., lines 28, 36, 51, 63, 75, 83, 195). Please translate them to English to maintain consistency across the codebase.

src/libspu/kernel/hlo/Logstar.cc

src/libspu/kernel/hlo/Logstar_test.cc

Copilot

Pull request overview

This pull request implements an odd-even merge network in the HAL kernel and a Brent-Kung based aggregation (Logstar) in the HLO kernel. Both modules support secure multi-party computation with custom comparators and optional payloads, enabling efficient parallel merging and aggregation operations.

Key Changes:

Added general-purpose merge operations (merge, merge_with_payloads) to the HAL kernel with support for custom comparators and payloads
Implemented Brent-Kung based aggregation (AggregateBrentKung) in the HLO kernel for efficient vectorized parallel prefix operations
Introduced odd-even merge network topology generator using iterative stack-based approach to avoid recursion

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 37 comments.

Show a summary per file

File	Description
`src/libspu/kernel/hal/merge.h`	Declares public API for merge operations
`src/libspu/kernel/hal/merge.cc`	Implements merge and merge_with_payloads by concatenating inputs and delegating to 1D merge functions
`src/libspu/kernel/hal/permute.h`	Adds declarations for merge1d and merge1d_with_payloads functions
`src/libspu/kernel/hal/permute.cc`	Implements odd-even merge network topology generation and merge operations
`src/libspu/kernel/hal/merge_test.cc`	Provides comprehensive test coverage for merge operations including basic correctness and large-scale tests
`src/libspu/kernel/hal/BUILD.bazel`	Adds Bazel build targets for merge module and tests
`src/libspu/kernel/hlo/Logstar.h`	Declares public API for Brent-Kung aggregation operations
`src/libspu/kernel/hlo/Logstar.cc`	Implements vectorized Brent-Kung parallel prefix scan with up-sweep and down-sweep phases
`src/libspu/kernel/hlo/Logstar_test.cc`	Provides test coverage for Brent-Kung aggregation with basic correctness and large-scale benchmarks
`src/libspu/kernel/hlo/BUILD.bazel`	Adds Bazel build targets for Logstar module and tests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-31T07:42:30Z

src/libspu/kernel/hal/permute.cc

+      stack.push_back(std::move(odd_frame));
+
+    } else if (frame.phase == 2) {
+      // odd 完成，开始 even


The comment contains Chinese characters which should be replaced with English for consistency and accessibility. The comment translates to "odd completed, start even".

Copilot · 2025-12-31T07:42:30Z

src/libspu/kernel/hlo/Logstar.cc

+// 输入的 p1, p2 维度为 [Batch, BlockSize]
+// 输入的 g1, g2 维度为 [Batch, 1]


The comment contains Chinese characters which should be replaced with English for consistency and accessibility. The comment translates to "inputs are p1, p2 dimensions are [Batch, BlockSize]; inputs g1, g2 dimensions are [Batch, 1]".

Copilot · 2025-12-31T07:42:31Z

src/libspu/kernel/hlo/Logstar.cc

+  n |= n >> 32;
+  return n + 1;
+}
+// // 最优：非 padding 方案，额外开销低


The comment contains Chinese characters which should be replaced with English for consistency and accessibility. The comment translates to "Optimal: non-padding scheme, low extra overhead".

Copilot · 2025-12-31T07:42:31Z

src/libspu/kernel/hal/permute.cc

+  // 定义比较器包装函数：
+  //
+  // _cmp_swap 会 gather 出：
+  //   gathered_inputs = [Value_L, Value_R, Payload_L, Payload_R]
+  //
+  // 根据 Values 进行比较，取前两个元素传给原始比较器


The comment contains Chinese characters which should be replaced with English for consistency and accessibility. The comment translates to "Define comparator wrapper function: _cmp_swap will gather: gathered_inputs = [Value_L, Value_R, Payload_L, Payload_R]; compare based on Values, pass first two elements to original comparator".

Suggested change

// 定义比较器包装函数：

//

// _cmp_swap 会 gather 出：

// gathered_inputs = [Value_L, Value_R, Payload_L, Payload_R]

//

// 根据 Values 进行比较，取前两个元素传给原始比较器

// Define comparator wrapper function:

//

// _cmp_swap will gather:

// gathered_inputs = [Value_L, Value_R, Payload_L, Payload_R]

//

// Compare based on Values, and pass the first two elements to the original comparator.

Copilot · 2025-12-31T07:42:31Z

src/libspu/kernel/hlo/Logstar_test.cc

+  });
+}
+
+TEST_F(BrentKungTest, LargeScaleIntergers) {


The test name "LargeScaleIntergers" contains a typo. It should be "Integers" instead of "Intergers".

Suggested change

TEST_F(BrentKungTest, LargeScaleIntergers) {

TEST_F(BrentKungTest, LargeScaleIntegers) {

Copilot · 2025-12-31T07:42:39Z

src/libspu/kernel/hlo/Logstar.cc

+  auto g3 = hal::mul(ctx, g1, g2);
+  // 2. diff = p2 - p1
+  auto diff = hal::sub(ctx, p2, p1);
+  // 3. 广播: g1 是 [Batch, 1], diff 是 [Batch, BlockSize]


The comment contains Chinese characters which should be replaced with English for consistency and accessibility. The comment translates to "Broadcast: g1 is [Batch, 1], diff is [Batch, BlockSize]".

src/libspu/kernel/hlo/Logstar.cc

Copilot · 2025-12-31T07:42:40Z

src/libspu/kernel/hlo/Logstar_test.cc

+    const int64_t block_size = 2;
+    std::mt19937 rng(std::random_device{}());
+    std::uniform_real_distribution<float> dist_x(0, 1000);
+    std::uniform_int_distribution<int> dist_binary(0, 1);  // 改为 int 类型


The comment contains Chinese characters which should be replaced with English for consistency and accessibility. The comment translates to "changed to int type".

Suggested change

std::uniform_int_distribution<int> dist_binary(0, 1); // 改为 int 类型

std::uniform_int_distribution<int> dist_binary(0, 1); // changed to int type

src/libspu/kernel/hal/merge.cc

src/libspu/kernel/hal/permute.h

zhou-pz added 15 commits December 4, 2025 14:03

odd_even_merge and brent_kung_network

ac63404

optimize odd_even_merge_network and brent_kung_network

1b88eb6

Adding odd_even_merge_with_valids

7c36c6e

Added brent_kung_network_with_valids

32df4f4

Optimize brent_kung_network, add extract_ordered (unfinished).

64d7123

revise extract_ordered to fit multiple x_inputs

838eac4

optimize brent_kung_network

2bace2a

Add TESTs for odd even merge and brent kung network

eed65bd

optimize odd-even-merge

683ecbb

optimize brent-kung-network

47c941c

optimize brent-kung-network and odd-even-merge-network

09083df

optimize odd-even-merge

f85f3d1

Revert hlo/shuffle

9c20ee1

delete refresh-compile-command-bazel-module.py

cc3339b

Revert hlo/shuffle.h

ff27429

Copilot AI review requested due to automatic review settings December 31, 2025 07:34

Copilot started reviewing on behalf of zhou-pz December 31, 2025 07:35 View session

gemini-code-assist bot reviewed Dec 31, 2025

View reviewed changes

Copilot AI reviewed Dec 31, 2025

View reviewed changes

Optimize annotations

ee0dac9

		SPU_ENFORCE_EQ(inputs.size(), payloads.size(),
		"Number of input tensors and valid tensors must match");

	SPU_THROW("Should not reach here");
	SPU_ENFORCE(comparator_ret_vis == VIS_SECRET, "merge1d currently only supports secret comparator return visibility");

		// 输入的 p1, p2 维度为 [Batch, BlockSize]
		// 输入的 g1, g2 维度为 [Batch, 1]

	TEST_F(BrentKungTest, LargeScaleIntergers) {
	TEST_F(BrentKungTest, LargeScaleIntegers) {

	std::uniform_int_distribution<int> dist_binary(0, 1); // 改为 int 类型
	std::uniform_int_distribution<int> dist_binary(0, 1); // changed to int type

Implement odd-even-merge network #1370

Are you sure you want to change the base?

Implement odd-even-merge network #1370

Uh oh!

Conversation

zhou-pz commented Dec 31, 2025

What problem does this PR solve?

Possible side effects?

Uh oh!

gemini-code-assist bot commented Dec 31, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant