Zjj/wk25 merge join——join_uu #1368

cpz2024 · 2025-12-27T14:36:09Z

Pull Request

What problem does this PR solve?

Issue Number: Fixed #

Possible side effects?

Performance:
Backward compatibility:

gemini-code-assist · 2025-12-27T14:36:37Z

Summary of Changes

Hello @cpz2024, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new secure join primitive, "join_uu", designed for two-party computation within the SPU framework. The primary goal is to enable two distinct parties to perform a join operation on their respective private datasets while maintaining data confidentiality. The implementation utilizes a Cuckoo Hash-based approach, incorporating secure pseudo-random function (SoPrf) outputs and cryptographic hashing (Blake3) to generate permutations that facilitate the secure comparison and merging of records. This enhancement significantly expands SPU's capabilities for privacy-preserving data analysis.

Highlights

New Secure Join Functionality: Introduced "join_uu", a secure two-party join operation based on the Cuckoo Hash algorithm, allowing private datasets to be joined without revealing individual data.
Cuckoo Hash Integration: Implemented the "_cuckoo_hash_to_perm_v" kernel and its underlying logic in "pv2k.cc", leveraging "yacl::CuckooIndex" and "Blake3" hashing for efficient and secure permutation generation.
Comprehensive Testing: Added dedicated unit tests ("join_test.cc") for the "join_uu" function, covering various field types, protocol kinds, and both single and multi-key join scenarios.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a secure two-party join implementation (join_uu) based on the "Private Join and Compute from PIR with Default" paper. The changes are comprehensive, including the core join logic, a new MPC kernel for Cuckoo Hashing, build file updates, and corresponding tests.

I've identified a critical correctness issue in the join logic where it selects matching rows, which could lead to incorrect results. Additionally, there are several opportunities for improvement in code maintainability, such as removing test dependencies from production code, improving comments, and reducing code duplication. I've also noted some minor issues in the test files that should be addressed.

gemini-code-assist · 2025-12-27T14:39:14Z

src/libspu/kernel/hal/join.cc

+    spu::Value col_result =
+        hal::constant(ctx, 0, table_2[col_idx].dtype(), table_1[0].shape());
+    spu::Value control_bit = hal::constant(ctx, 0, join_result_cols[0].dtype(),
+                                           join_result_cols[0].shape());
+    for (size_t hash_idx = 0; hash_idx < num_hash; ++hash_idx) {
+      // Not operation on control_bit.
+      control_bit = hal::bitwise_not(ctx, control_bit);
+      // And control_bit and join_result_cols[hash_idx].
+      control_bit =
+          hal::bitwise_and(ctx, control_bit, join_result_cols[hash_idx]);
+      // Multiply the corresponding columns in control_bit and table_t_1.
+      spu::Value table_t_2_i_col =
+          table_t_1[(hash_idx * (table_2.size() + 1)) + col_idx];
+      spu::Value mul_result = hal::mul(ctx, table_t_2_i_col, control_bit);
+      col_result = hal::add(ctx, col_result, mul_result);
+    }


The logic for calculating control_bit to select the first match among multiple hash locations is incorrect. The current implementation fails after the second iteration, which will lead to incorrect join results when an item from table_1 matches items in table_2 at multiple hash locations. The logic should ensure that for each row, only the first match across all hash functions is considered.

Suggested change

spu::Value col_result =

hal::constant(ctx, 0, table_2[col_idx].dtype(), table_1[0].shape());

spu::Value control_bit = hal::constant(ctx, 0, join_result_cols[0].dtype(),

join_result_cols[0].shape());

for (size_t hash_idx = 0; hash_idx < num_hash; ++hash_idx) {

// Not operation on control_bit.

control_bit = hal::bitwise_not(ctx, control_bit);

// And control_bit and join_result_cols[hash_idx].

control_bit =

hal::bitwise_and(ctx, control_bit, join_result_cols[hash_idx]);

// Multiply the corresponding columns in control_bit and table_t_1.

spu::Value table_t_2_i_col =

table_t_1[(hash_idx * (table_2.size() + 1)) + col_idx];

spu::Value mul_result = hal::mul(ctx, table_t_2_i_col, control_bit);

col_result = hal::add(ctx, col_result, mul_result);

}

spu::Value col_result =

hal::constant(ctx, 0, table_2[col_idx].dtype(), table_1[0].shape());

spu::Value processed_mask = hal::constant(ctx, 0, join_result_cols[0].dtype(),

join_result_cols[0].shape());

for (size_t hash_idx = 0; hash_idx < num_hash; ++hash_idx) {

// Get a mask for the first match only.

spu::Value control_bit = hal::bitwise_and(

ctx, join_result_cols[hash_idx], hal::bitwise_not(ctx, processed_mask));

// Multiply the corresponding columns in control_bit and table_t_1.

spu::Value table_t_2_i_col =

table_t_1[(hash_idx * (table_2.size() + 1)) + col_idx];

spu::Value mul_result = hal::mul(ctx, table_t_2_i_col, control_bit);

col_result = hal::add(ctx, col_result, mul_result);

// Update the processed mask

processed_mask = hal::bitwise_or(ctx, processed_mask, join_result_cols[hash_idx]);

}

的确，感觉这里有bug，如果多个hash的取值一样，这里的计算就有问题了。

这里的逻辑可以再仔细考虑一下有没有更优雅的方式，，
比如如果是2个hash function的时候，有2个bit（记为a,b）情况是：00,01,10,11；有一个很直接的实现就是 a * col_a + b * col_b - (a && b) * col_a (如果a,b 均为true，则permute过来的col_a = col_b，随便减一个应该就ok)
如果是3个或者更多的时候可能要考虑一下？

gemini-code-assist · 2025-12-27T14:39:14Z

src/libspu/kernel/hal/BUILD.bazel

+        ":shape_ops",
+        ":utils",
+        "@yacl//yacl/utils:cuckoo_index",
+        "//libspu/kernel:test_util",


Production libraries like join should not depend on test utilities such as //libspu/kernel:test_util. This creates an undesirable coupling between production and test code. Please remove this dependency.

gemini-code-assist · 2025-12-27T14:39:14Z

src/libspu/kernel/hal/join.cc

+#include "libspu/kernel/hal/utils.h"
+#include "libspu/kernel/hlo/permute.h"
+#include "libspu/kernel/hlo/soprf.h"
+#include "libspu/kernel/test_util.h"


Including test_util.h in a non-test source file is not a good practice as it couples production code with test code. This header should be removed. The corresponding dependency in BUILD.bazel should also be removed.

gemini-code-assist · 2025-12-27T14:39:14Z

src/libspu/kernel/hal/join.cc

+  // 当field == FieldType::FM64 && num_join_keys ==
+  // 1时，后面需要使用FM64，否则结果不对


This comment is in Chinese, while the rest of the codebase is in English. To ensure consistency and maintainability for all contributors, please translate this comment to English.

Suggested change

// 当field == FieldType::FM64 && num_join_keys ==

// 1时，后面需要使用FM64，否则结果不对

// When field is FM64 and there is only one join key, we must use FM64 for

// cuckoo hashing as well, otherwise the result is incorrect.

gemini-code-assist · 2025-12-27T14:39:14Z

src/libspu/kernel/hal/join.cc

+    table_t_2_i_keys.reserve(num_join_keys);
+    for (size_t j = 0; j < num_join_keys; ++j) {
+      table_t_2_i_keys.push_back(
+          table_t_1[(i * (table_2.size() + 1)) +


The expression (i * (table_2.size() + 1)) is used multiple times (here and on line 201). To improve readability and maintainability, consider introducing a named constant for table_2.size() + 1 before the loop at line 178.

gemini-code-assist · 2025-12-27T14:39:14Z

src/libspu/kernel/hal/join_test.cc

+        if (lctx->Rank() == 0) {
+          std::cout << "Join communication sent bytes: " << (b1 - b0)
+                    << ", sent actions: " << (r1 - r0) << std::endl;
+        }


Using std::cout for logging in tests can clutter the output. If this is for debugging, it should be removed before merging. For permanent test logging, consider using a dedicated logging framework or gtest's built-in mechanisms.

gemini-code-assist · 2025-12-27T14:39:14Z

src/libspu/kernel/hal/join_test.cc

+// TEST(BigDataJoinTest, Work) {
+//   FieldType field = FieldType::FM64;
+//   ProtocolKind prot = ProtocolKind::SEMI2K;
+//   size_t num_join_keys = 1;
+//   const size_t num_hash = 3;
+//   const size_t scale_factor = 15;
+
+//   const Shape shape_1 = {2, 1000};
+//   const Shape shape_2 = {2, 1000};
+//   xt::xarray<uint64_t> data_1 = xt::random::randint<uint64_t>(shape_1, 0);
+//   xt::xarray<uint64_t> data_2 = xt::random::randint<uint64_t>(shape_2, 0);
+//   for (auto i = 0; i < shape_1[1]; ++i) {
+//     data_1(0, i) = i;
+//     data_1(1, i) = i + 100;
+//   }
+//   for (auto i = shape_2[1] - 1; i >= 0; --i) {
+//     data_2(0, i) = i;
+//     data_2(1, i) = i + 200;
+//   }
+
+//   mpc::utils::simulate(
+//       2, [&](const std::shared_ptr<yacl::link::Context>& lctx) {
+//         SPUContext sctx = test::makeSPUContext(prot, field, lctx);
+
+//         std::vector<Value> table1_columns;
+//         for (int64_t i = 0; i < shape_1[0]; ++i) {
+//           xt::xarray<uint64_t> col_data = xt::row(data_1, i);
+//           Value col = test::makeValue(&sctx, col_data, VIS_SECRET);
+//           table1_columns.push_back(col);
+//         }
+
+//         std::vector<Value> table2_columns;
+//         for (int64_t i = 0; i < shape_2[0]; ++i) {
+//           xt::xarray<uint64_t> col_data = xt::row(data_2, i);
+//           Value col = test::makeValue(&sctx, col_data, VIS_SECRET);
+//           table2_columns.push_back(col);
+//         }
+
+//         absl::Span<const Value> table1_span =
+//             absl::MakeConstSpan(table1_columns);
+//         absl::Span<const Value> table2_span =
+//             absl::MakeConstSpan(table2_columns);
+
+//         size_t b0 = lctx->GetStats()->sent_bytes;
+//         size_t r0 = lctx->GetStats()->sent_actions;
+
+//         auto ret = join_uu(&sctx, table1_span, table2_span, num_join_keys,
+//                            num_hash, scale_factor, field);
+//         size_t b1 = lctx->GetStats()->sent_bytes;
+//         size_t r1 = lctx->GetStats()->sent_actions;
+
+//         if (lctx->Rank() == 0) {
+//           std::cout << "Join communication sent bytes: " << (b1 - b0)
+//                     << ", sent actions: " << (r1 - r0) << std::endl;
+//         }
+//       });
+// }
+


This large block of commented-out code (BigDataJoinTest) should be removed if it's no longer needed. Keeping dead code reduces readability and maintainability.

gemini-code-assist · 2025-12-27T14:39:14Z

src/libspu/mpc/common/pv2k.cc

+    if (isOwner(ctx, e_1.eltype())) {
+      DISPATCH_ALL_FIELDS(field, [&]() {
+        NdArrayView<ring2k_t> e_1_view(e_1);
+        for (int64_t i = 0; i < n_1; ++i) {
+          blake3.Reset();
+          element = e_1_view[i];
+          blake3.Update(yacl::ByteContainerView(
+              reinterpret_cast<const char*>(&element), sizeof(element)));
+          hash_output = blake3.CumulativeHash();
+          result_tmp = 0;
+          memcpy(&result_tmp, hash_output.data(),
+                 std::min(hash_output.size(), sizeof(uint128_t)));
+          result.push_back(result_tmp);
+        }
+      });
+    } else if (isOwner(ctx, e_2.eltype())) {
+      DISPATCH_ALL_FIELDS(field, [&]() {
+        NdArrayView<ring2k_t> e_2_view(e_2);
+        for (int64_t i = 0; i < n_2; ++i) {
+          blake3.Reset();
+          element = e_2_view[i];
+          blake3.Update(yacl::ByteContainerView(
+              reinterpret_cast<const char*>(&element), sizeof(element)));
+          hash_output = blake3.CumulativeHash();
+          result_tmp = 0;
+          memcpy(&result_tmp, hash_output.data(),
+                 std::min(hash_output.size(), sizeof(uint128_t)));
+          result.push_back(result_tmp);
+        }
+      });
+    }


The code blocks for hashing e_1 (lines 1070-1084) and e_2 (lines 1085-1100) are nearly identical. This duplication can be avoided by extracting the hashing logic into a helper lambda or function. This would make the code more concise and easier to maintain.

auto hash_elements = [&](const NdArrayRef& e) { DISPATCH_ALL_FIELDS(field, [&]() { NdArrayView<ring2k_t> e_view(e); for (int64_t i = 0; i < e.numel(); ++i) { blake3.Reset(); element = e_view[i]; blake3.Update(yacl::ByteContainerView( reinterpret_cast<const char*>(&element), sizeof(element))); hash_output = blake3.CumulativeHash(); result_tmp = 0; memcpy(&result_tmp, hash_output.data(), std::min(hash_output.size(), sizeof(uint128_t))); result.push_back(result_tmp); } }); }; if (isOwner(ctx, e_1.eltype())) { hash_elements(e_1); } else if (isOwner(ctx, e_2.eltype())) { hash_elements(e_2); }

…wk25_merge_join

deadlywing · 2026-01-07T07:45:22Z

src/libspu/kernel/hal/join.cc

+              "num_join_keys exceeds table_2 size");
+
+  // Number of rows in table_1
+  const int64_t n_1 = table_1[0].shape()[0];


需要分别check 两个表中所有的spu::Value必须是1-d，且row必须一致，这个假设是合理的，因为table的一列本身就应该是1-d(这样后面不需要特地去reshape，且slice也能更简单一些)

deadlywing · 2026-01-07T07:46:54Z

src/libspu/kernel/hal/join.cc

+  SPU_ENFORCE(num_join_keys > 0, "num_join_keys must be greater than 0");
+  SPU_ENFORCE(num_join_keys <= table_1.size(),
+              "num_join_keys exceeds table_1 size");
+  SPU_ENFORCE(num_join_keys <= table_2.size(),


可以把所有的check都放在一个block里，即

{ // all SPU_ENFORCE here }

deadlywing · 2026-01-07T07:48:21Z

src/libspu/kernel/hal/join.cc

+    spu::Value key_i = hal::_concatenate(ctx, {t1_reshaped, t2_reshaped}, 1);
+    join_keys.push_back(key_i);
+  }
+  spu::Value ret = spu::kernel::hlo::SoPrf(ctx, absl::MakeSpan(join_keys));


从分层上看，HAL层不应该调用HLO层中的API；可以使用hal层中的替代

deadlywing · 2026-01-07T07:51:13Z

src/libspu/kernel/hal/join.cc

+              "num_join_keys exceeds table_1 size");
+  SPU_ENFORCE(num_join_keys <= table_2.size(),
+              "num_join_keys exceeds table_2 size");
+


现阶段增加一个check：起码有一个key是secret（此时才需要soprf）

PS：等后面pk-fk，fk-fk实现完，可以再补充两方private等其他vis的实现

deadlywing · 2026-01-07T07:57:31Z

src/libspu/kernel/hal/join.cc

+    auto t1_reshaped = spu::kernel::hal::reshape(ctx, table_1[i], {1, n_1});
+    auto t2_reshaped = spu::kernel::hal::reshape(ctx, table_2[i], {1, n_2});


simplify reshape

deadlywing · 2026-01-07T11:31:46Z

src/libspu/kernel/hal/join.cc

+  std::vector<spu::Value> table_2_expand;
+  table_2_expand.reserve(table_2.size() + 1);
+  for (const auto& col : table_2) {
+    auto pad_value = hal::seal(ctx, hal::constant(ctx, 0, col.dtype()));


可以考虑不用0，而是用一个not valid value；如1000 0000 .... 0000
这样你后面也不用额外维护一个indicator

deadlywing · 2026-01-07T11:35:49Z

src/libspu/kernel/hal/join.cc

+  for (size_t i = 0; i < num_join_keys; ++i) {
+    table_1_keys.push_back(table_1[i]);
+  }
+  auto table_1_key = hal::concatenate(ctx, table_1_keys, 0);


naming上稍微注意一下，比如这里可以取 table1_key_concat，尽量能做到见名知义；其他地方也类似修正一下

deadlywing · 2026-01-07T11:40:31Z

src/libspu/kernel/hal/join.cc

+  for (size_t i = 0; i < num_hash; ++i) {
+    // Generate permutation pi_1 based on the i-th row of perm_all.
+    const spu::Value& pi_1_v = perm_all[i];
+    auto table_t_1_i =


这个命名有点诡异啊，，感觉应该带上 tbl2_perm_by_pi1 之类的信息，不然光 table_t_1_i 后面完成不知道这里的含义

deadlywing · 2026-01-07T11:45:03Z

src/libspu/kernel/hal/join.cc

+  std::vector<spu::Value> join_result_cols;
+  join_result_cols.reserve(num_hash);
+  size_t begin_id_of_hash_i = 0;
+  join_result_cols.reserve(num_hash);


deadlywing · 2026-01-07T12:05:28Z

src/libspu/kernel/hal/join.cc

+    spu::Value col_result =
+        hal::constant(ctx, 0, table_2[col_idx].dtype(), table_1[0].shape());
+    spu::Value control_bit = hal::constant(ctx, 0, join_result_cols[0].dtype(),
+                                           join_result_cols[0].shape());
+    for (size_t hash_idx = 0; hash_idx < num_hash; ++hash_idx) {
+      // Not operation on control_bit.
+      control_bit = hal::bitwise_not(ctx, control_bit);
+      // And control_bit and join_result_cols[hash_idx].
+      control_bit =
+          hal::bitwise_and(ctx, control_bit, join_result_cols[hash_idx]);
+      // Multiply the corresponding columns in control_bit and table_t_1.
+      spu::Value table_t_2_i_col =
+          table_t_1[(hash_idx * (table_2.size() + 1)) + col_idx];
+      spu::Value mul_result = hal::mul(ctx, table_t_2_i_col, control_bit);
+      col_result = hal::add(ctx, col_result, mul_result);
+    }


的确，感觉这里有bug，如果多个hash的取值一样，这里的计算就有问题了。

deadlywing · 2026-01-08T02:29:09Z

src/libspu/kernel/hal/join.cc

+  std::vector<spu::Value> join_results;
+  join_results.reserve(table_1.size() + table_2.size() + 1);
+  for (const auto& col : table_1) {
+    join_results.push_back(hal::mul(ctx, col, join_result));
+  }
+  for (const auto& col : table_2_result) {
+    join_results.push_back(col);
+  }
+  join_results.push_back(join_result);
+
+  return join_results;
+}


现在的输出感觉有点冗余

Join output column 0: {1, 4, 0, 5, 0, 0, 7, 0} Join output column 1: {11, 44, 0, 55, 0, 0, 77, 0} Join output column 2: {1, 4, 0, 5, 0, 0, 7, 0} Join output column 3: {111, 444, 0, 555, 0, 0, 777, 0} Join output column 4: {1, 1, 0, 1, 0, 0, 1, 1}

感觉没必要在算法里去做最后的乘法，因为已经有valid flag了，后续的处理可以让用户自己做。

其次，只需要保留一份key就行了；

最后，顺序可以交换一下，valid flag可以放到第一个，其次是key，最后是payloads

deadlywing · 2026-01-08T02:38:45Z

src/libspu/kernel/hal/join.cc

+    spu::Value col_result =
+        hal::constant(ctx, 0, table_2[col_idx].dtype(), table_1[0].shape());
+    spu::Value control_bit = hal::constant(ctx, 0, join_result_cols[0].dtype(),
+                                           join_result_cols[0].shape());
+    for (size_t hash_idx = 0; hash_idx < num_hash; ++hash_idx) {
+      // Not operation on control_bit.
+      control_bit = hal::bitwise_not(ctx, control_bit);
+      // And control_bit and join_result_cols[hash_idx].
+      control_bit =
+          hal::bitwise_and(ctx, control_bit, join_result_cols[hash_idx]);
+      // Multiply the corresponding columns in control_bit and table_t_1.
+      spu::Value table_t_2_i_col =
+          table_t_1[(hash_idx * (table_2.size() + 1)) + col_idx];
+      spu::Value mul_result = hal::mul(ctx, table_t_2_i_col, control_bit);
+      col_result = hal::add(ctx, col_result, mul_result);
+    }


这里的逻辑可以再仔细考虑一下有没有更优雅的方式，，
比如如果是2个hash function的时候，有2个bit（记为a,b）情况是：00,01,10,11；有一个很直接的实现就是 a * col_a + b * col_b - (a && b) * col_a (如果a,b 均为true，则permute过来的col_a = col_b，随便减一个应该就ok)
如果是3个或者更多的时候可能要考虑一下？

deadlywing · 2026-01-08T02:41:57Z

src/libspu/kernel/hal/join_test.cc

+              hal::dump_public_as<uint64_t>(&sctx, hal::reveal(&sctx, ret[i]));
+
+          if (lctx->Rank() == 0) {
+            std::cout << "Join output column " << i << ": " << ret_hat


单测里应该使用EXPECT_EQ等方式来验证；而不能只是print；

如，在这里，你需要check：

ret的size

你可以把ret按照valid flag 先排序，只需要检查valid部分的值是和expected一致的

deadlywing · 2026-01-08T02:42:26Z

src/libspu/kernel/hal/join_test.cc

+
+          if (lctx->Rank() == 0) {
+            std::cout << "Join output column " << i << ": " << ret_hat
+                      << std::endl;


同上，需要增加expected 校验

deadlywing · 2026-01-08T03:13:55Z

我优化了一下，现在的耗时看上去科学一点了；需要先根据上面的comment先修改一下；

另外，在我确认代码整体逻辑ok后，还有一些setting，需要结合SPU的实现定量分析一下，并且通过实验来验证一下：

直观上，hash function应该少一点更好（起码从轮数上看是的）；
哪个表去做cuckoo hash；

你需要做一些实验，变量有（下面的具体数值是我随便拍的，你可以适当调整）：

两个表的size（10w，50w, 100w）
两个表的payloads个数(0, 1, 10, 20)
hash 函数个数(2，3；具体的scale_factor 你可以看着调整一下，我这边在100w下发现2个hash函数的时候，factor=2.1， 3个hash函数的时候，factor=1.2就能跑)

PS：你可以写个脚本跑上面所有的实验，把输出都重定向到文件里，然后再单独分析哈～（方便起见可以直接收集link输出的total通信量和通信轮数以及hal层输出的时间）

HAL profiling: total time 32.635025206
Join send bytes: 2010400030
Join send actions: 82

bazelisk run //libspu/kernel/hal:join_test -- --gtest_filter="*BigDataJoinTest*"
INFO: Analyzed target //libspu/kernel/hal:join_test (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //libspu/kernel/hal:join_test up-to-date:
  bazel-bin/libspu/kernel/hal/join_test
INFO: Elapsed time: 10.847s, Critical Path: 10.68s
INFO: 3 processes: 1 internal, 2 processwrapper-sandbox.
INFO: Build completed successfully, 3 total actions
INFO: Running command line: external/bazel_tools/tools/test/test-setup.sh libspu/kernel/hal/join_test '--gtest_filter=*BigDataJoinTest*'
exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //libspu/kernel/hal:join_test
-----------------------------------------------------------------------------
Running main() from gmock_main.cc
Note: Google Test filter = *BigDataJoinTest*
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from BigDataJoinTest
[ RUN      ] BigDataJoinTest.Work
[2026-01-08 02:56:57.596] [info] [thread_pool.cc:30] Create a fixed thread pool with size 63
[2026-01-08 02:57:30.326] [info] [test_util.cc:130] HLO profiling: total time 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:130] HAL profiling: total time 32.635025206
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - soprf, executed 1 times, duration 16.252644108s, send bytes 1040000000 recv bytes 1040000000, send actions 19, recv actions 19
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - i_mul, executed 6 times, duration 9.238196789s, send bytes 144000000 recv bytes 144000000, send actions 12, recv actions 12
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - reveal_to, executed 2 times, duration 2.967646466s, send bytes 24000000 recv bytes 24000000, send actions 3, recv actions 3
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - _and, executed 7 times, duration 2.635390364s, send bytes 496000000 recv bytes 496000000, send actions 19, recv actions 19
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - _perm2_sv, executed 9 times, duration 0.887404965s, send bytes 98400030 recv bytes 151200015, send actions 15, recv actions 12
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - i_equal, executed 2 times, duration 0.471000155s, send bytes 208000000 recv bytes 208000000, send actions 14, recv actions 14
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - _cuckoo_hash_to_perm_v, executed 1 times, duration 0.165842127s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - _not, executed 4 times, duration 0.009093997s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - i_add, executed 4 times, duration 0.005032646s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - _xor, executed 2 times, duration 0.002712303s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - seal, executed 2 times, duration 6.1286e-05s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:130] MPC profiling: total time 21.028531686999994
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - b2a, executed 8 times, duration 11.729486609s, send bytes 64000000 recv bytes 64000000, send actions 8, recv actions 8
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - a2b, executed 3 times, duration 4.608958895s, send bytes 832000000 recv bytes 832000000, send actions 21, recv actions 21
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - and_bb, executed 17 times, duration 1.557097513s, send bytes 704000000 recv bytes 704000000, send actions 17, recv actions 17
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - xor_bb, executed 75 times, duration 1.221434395s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - perm2_sv, executed 9 times, duration 0.887346036s, send bytes 98400030 recv bytes 151200015, send actions 15, recv actions 12
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - equal_ss, executed 2 times, duration 0.470972333s, send bytes 208000000 recv bytes 208000000, send actions 14, recv actions 14
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - mul_aa, executed 6 times, duration 0.469724159s, send bytes 96000000 recv bytes 96000000, send actions 6, recv actions 6
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - xor_bp, executed 15 times, duration 0.022715458s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - concatenate, executed 1 times, duration 0.020547212s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - pad, executed 2 times, duration 0.012355138s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - p2a, executed 3 times, duration 0.00766053s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - a2v, executed 2 times, duration 0.006513844s, send bytes 8000000 recv bytes 8000000, send actions 1, recv actions 1
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - negate_p, executed 4 times, duration 0.003025485s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - add_pp, executed 2 times, duration 0.003003714s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - add_aa, executed 2 times, duration 0.00274726s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - and_bp, executed 2 times, duration 0.002624968s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - add_ap, executed 2 times, duration 0.002256703s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - make_p, executed 5 times, duration 3.0058e-05s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - extract_slice, executed 6 times, duration 2.5677e-05s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - reshape, executed 2 times, duration 5.7e-06s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
Join send bytes: 2010400030
Join recv bytes: 2063200015
Join send actions: 82
Join recv actions: 79
[       OK ] BigDataJoinTest.Work (32798 ms)
[----------] 1 test from BigDataJoinTest (32798 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (32798 ms total)
[  PASSED  ] 1 test.

cpz2024 added 5 commits December 8, 2025 10:37

join_uu

57256e4

join_uu

afdfd18

new changes on join_uu

f51111e

new changes on join_uu

309730f

new changes on join_uu

508a2ed

gemini-code-assist bot reviewed Dec 27, 2025

View reviewed changes

cpz2024 and others added 6 commits January 4, 2026 14:42

Merge remote-tracking branch 'upstream/zjj/wk25_merge_join' into zjj/…

28e45d2

…wk25_merge_join

new changes

1376ae0

new changes

6dc983b

new changes

8cc5bd9

new changes

9a9f291

optimize solvePerm with O(1) lookup

37ca933

deadlywing reviewed Jan 7, 2026

View reviewed changes

add lctx stats

4f7b996

deadlywing reviewed Jan 8, 2026

View reviewed changes

		// 当field == FieldType::FM64 && num_join_keys ==
		// 1时，后面需要使用FM64，否则结果不对

		auto t1_reshaped = spu::kernel::hal::reshape(ctx, table_1[i], {1, n_1});
		auto t2_reshaped = spu::kernel::hal::reshape(ctx, table_2[i], {1, n_2});

Zjj/wk25 merge join——join_uu #1368

Are you sure you want to change the base?

Zjj/wk25 merge join——join_uu #1368

Uh oh!

Conversation

cpz2024 commented Dec 27, 2025

Pull Request

What problem does this PR solve?

Possible side effects?

Uh oh!

gemini-code-assist bot commented Dec 27, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deadlywing commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels