-
Notifications
You must be signed in to change notification settings - Fork 140
Implement odd-even-merge network #1370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: zjj/wk25_merge_join
Are you sure you want to change the base?
Implement odd-even-merge network #1370
Conversation
Summary of ChangesHello @zhou-pz, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the SPU kernel by introducing advanced secure computation primitives. It provides new general-purpose merge operations within the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a general-purpose merge operation in the HAL kernel and a Brent-Kung based aggregation (Logstar) in the HLO kernel, along with their respective build rules and tests. The changes are well-structured. My review focuses on improving code clarity by removing commented-out code, translating comments to English for consistency, and refining error messages. I've also included a suggestion for future performance optimization in the Brent-Kung implementation, which aligns with the note in the pull request description.
src/libspu/kernel/hal/merge.cc
Outdated
| SPU_ENFORCE_EQ(inputs.size(), payloads.size(), | ||
| "Number of input tensors and valid tensors must match"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message refers to "valid tensors", but the parameter is named payloads. To avoid confusion, it would be clearer to use "payload tensors" in the message.
| SPU_ENFORCE_EQ(inputs.size(), payloads.size(), | |
| "Number of input tensors and valid tensors must match"); | |
| SPU_ENFORCE_EQ(inputs.size(), payloads.size(), | |
| "Number of input tensors and payload tensors must match"); |
|
|
||
| ret = internal::odd_even_merge(ctx, cmp, inputs, split_idx); | ||
| } else { | ||
| SPU_THROW("Should not reach here"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using SPU_THROW here for a condition that should not be reached makes it harder to debug. It's better to use SPU_ENFORCE with a descriptive message to clearly state the violated assumption. For example, you could enforce that comparator_ret_vis must be VIS_SECRET.
| SPU_THROW("Should not reach here"); | |
| SPU_ENFORCE(comparator_ret_vis == VIS_SECRET, "merge1d currently only supports secret comparator return visibility"); |
src/libspu/kernel/hal/permute.cc
Outdated
| SPU_ENFORCE( | ||
| inputs.size() == 2, | ||
| "merge1d_with_payloads expects exactly 2 inputs (Value tensor and " | ||
| "Valid tensor)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message refers to "Valid tensor", but the function is generic and works with any payload. To make it more general and less confusing, please consider changing "Valid tensor" to "Payload tensor".
| SPU_ENFORCE( | |
| inputs.size() == 2, | |
| "merge1d_with_payloads expects exactly 2 inputs (Value tensor and " | |
| "Valid tensor)"); | |
| SPU_ENFORCE( | |
| inputs.size() == 2, | |
| "merge1d_with_payloads expects exactly 2 inputs (Value tensor and " | |
| "Payload tensor)"); |
src/libspu/kernel/hlo/Logstar.cc
Outdated
|
|
||
| namespace spu::kernel::hlo { | ||
|
|
||
| // Vectorized NoteFunc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request implements an odd-even merge network in the HAL kernel and a Brent-Kung based aggregation (Logstar) in the HLO kernel. Both modules support secure multi-party computation with custom comparators and optional payloads, enabling efficient parallel merging and aggregation operations.
Key Changes:
- Added general-purpose merge operations (
merge,merge_with_payloads) to the HAL kernel with support for custom comparators and payloads - Implemented Brent-Kung based aggregation (
AggregateBrentKung) in the HLO kernel for efficient vectorized parallel prefix operations - Introduced odd-even merge network topology generator using iterative stack-based approach to avoid recursion
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 37 comments.
Show a summary per file
| File | Description |
|---|---|
src/libspu/kernel/hal/merge.h |
Declares public API for merge operations |
src/libspu/kernel/hal/merge.cc |
Implements merge and merge_with_payloads by concatenating inputs and delegating to 1D merge functions |
src/libspu/kernel/hal/permute.h |
Adds declarations for merge1d and merge1d_with_payloads functions |
src/libspu/kernel/hal/permute.cc |
Implements odd-even merge network topology generation and merge operations |
src/libspu/kernel/hal/merge_test.cc |
Provides comprehensive test coverage for merge operations including basic correctness and large-scale tests |
src/libspu/kernel/hal/BUILD.bazel |
Adds Bazel build targets for merge module and tests |
src/libspu/kernel/hlo/Logstar.h |
Declares public API for Brent-Kung aggregation operations |
src/libspu/kernel/hlo/Logstar.cc |
Implements vectorized Brent-Kung parallel prefix scan with up-sweep and down-sweep phases |
src/libspu/kernel/hlo/Logstar_test.cc |
Provides test coverage for Brent-Kung aggregation with basic correctness and large-scale benchmarks |
src/libspu/kernel/hlo/BUILD.bazel |
Adds Bazel build targets for Logstar module and tests |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/libspu/kernel/hal/permute.cc
Outdated
| stack.push_back(std::move(odd_frame)); | ||
|
|
||
| } else if (frame.phase == 2) { | ||
| // odd 完成,开始 even |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment contains Chinese characters which should be replaced with English for consistency and accessibility. The comment translates to "odd completed, start even".
src/libspu/kernel/hlo/Logstar.cc
Outdated
| // 输入的 p1, p2 维度为 [Batch, BlockSize] | ||
| // 输入的 g1, g2 维度为 [Batch, 1] |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment contains Chinese characters which should be replaced with English for consistency and accessibility. The comment translates to "inputs are p1, p2 dimensions are [Batch, BlockSize]; inputs g1, g2 dimensions are [Batch, 1]".
src/libspu/kernel/hlo/Logstar.cc
Outdated
| n |= n >> 32; | ||
| return n + 1; | ||
| } | ||
| // // 最优:非 padding 方案,额外开销低 |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment contains Chinese characters which should be replaced with English for consistency and accessibility. The comment translates to "Optimal: non-padding scheme, low extra overhead".
src/libspu/kernel/hal/permute.cc
Outdated
| // 定义比较器包装函数: | ||
| // | ||
| // _cmp_swap 会 gather 出: | ||
| // gathered_inputs = [Value_L, Value_R, Payload_L, Payload_R] | ||
| // | ||
| // 根据 Values 进行比较,取前两个元素传给原始比较器 |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment contains Chinese characters which should be replaced with English for consistency and accessibility. The comment translates to "Define comparator wrapper function: _cmp_swap will gather: gathered_inputs = [Value_L, Value_R, Payload_L, Payload_R]; compare based on Values, pass first two elements to original comparator".
| // 定义比较器包装函数: | |
| // | |
| // _cmp_swap 会 gather 出: | |
| // gathered_inputs = [Value_L, Value_R, Payload_L, Payload_R] | |
| // | |
| // 根据 Values 进行比较,取前两个元素传给原始比较器 | |
| // Define comparator wrapper function: | |
| // | |
| // _cmp_swap will gather: | |
| // gathered_inputs = [Value_L, Value_R, Payload_L, Payload_R] | |
| // | |
| // Compare based on Values, and pass the first two elements to the original comparator. |
| }); | ||
| } | ||
|
|
||
| TEST_F(BrentKungTest, LargeScaleIntergers) { |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test name "LargeScaleIntergers" contains a typo. It should be "Integers" instead of "Intergers".
| TEST_F(BrentKungTest, LargeScaleIntergers) { | |
| TEST_F(BrentKungTest, LargeScaleIntegers) { |
src/libspu/kernel/hlo/Logstar.cc
Outdated
| auto g3 = hal::mul(ctx, g1, g2); | ||
| // 2. diff = p2 - p1 | ||
| auto diff = hal::sub(ctx, p2, p1); | ||
| // 3. 广播: g1 是 [Batch, 1], diff 是 [Batch, BlockSize] |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment contains Chinese characters which should be replaced with English for consistency and accessibility. The comment translates to "Broadcast: g1 is [Batch, 1], diff is [Batch, BlockSize]".
| const int64_t block_size = 2; | ||
| std::mt19937 rng(std::random_device{}()); | ||
| std::uniform_real_distribution<float> dist_x(0, 1000); | ||
| std::uniform_int_distribution<int> dist_binary(0, 1); // 改为 int 类型 |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment contains Chinese characters which should be replaced with English for consistency and accessibility. The comment translates to "changed to int type".
| std::uniform_int_distribution<int> dist_binary(0, 1); // 改为 int 类型 | |
| std::uniform_int_distribution<int> dist_binary(0, 1); // changed to int type |
This pull request introduces two new modules: a general-purpose merge operation in the
halkernel and a Brent-Kung based aggregation (Logstar) in thehlokernel. Both modules include their implementations, headers, Bazel build rules, and tests. The merge operation supports custom comparators and payloads, while the Logstar aggregation provides efficient, vectorized aggregation routines. Note: the Brent-Kung based aggregation (Logstar) is not yet optimized!New merge operations in HAL:
mergeandmerge_with_payloadsfunctions tohal, enabling general-purpose merging of tensors along a specified dimension with support for custom comparators and optional payloads. [1] [2]merge1dandmerge1d_with_payloadsinpermute.hfor 1D merging with comparator support.mergeandmerge_testinsrc/libspu/kernel/hal/BUILD.bazelfor the new functionality and its tests.New Logstar aggregation in HLO:
AggregateBrentKungand related vectorized aggregation routines inLogstar.cc, providing efficient parallel aggregation (Brent-Kung scan) with and without valid bits.AggregateBrentKung,AggregateBrentKung_without_valids, and utility functions inLogstar.h.LogstarandLogstar_testinsrc/libspu/kernel/hlo/BUILD.bazelfor the new Logstar aggregation and its tests.# Pull RequestWhat problem does this PR solve?
Issue Number: Fixed #
Possible side effects?
Performance:
Backward compatibility: