-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Describe the bug, including details regarding any error messages, version, and platform.
The issue is similar to #40007, but they are different.
I want to use the Hashing32::HashBatch api for produce a hash-array for a batch. Although the Hashing32 and Hashing64 are used in join based codes, but they can be used independently.
Like below codes:
auto arr = arrow::ArrayFromJSON(arrow::int32(), "[9,2,6]");
const int batch_len = arr->length();
arrow::compute::ExecBatch exec_batch({arr}, batch_len);
auto ctx = arrow::compute::default_exec_context();
arrow::util::TempVectorStack stack;
ASSERT_OK(stack.Init(ctx->memory_pool(), batch_len * sizeof(uint32_t))); // I just alloc the stack size as i needed.
std::vector<uint32_t> hashes(batch_len);
std::vector<arrow::compute::KeyColumnArray> temp_column_arrays;
ASSERT_OK(arrow::compute::Hashing32::HashBatch(
exec_batch, hashes.data(), temp_column_arrays,
ctx->cpu_info()->hardware_flags(), &stack, 0, batch_len));The crash stack in HashBatch is:
arrow::compute::Hashing32::HashBatch
arrow::compute::Hashing32::HashMultiColumn
arrow::util::TempVectorHolder<unsigned int>::TempVectorHolder
arrow::util::TempVectorStack::alloc
ARROW_DCHECK(top_ <= buffer_size_); // top_=4176, buffer_size_=160The reason is blow codes:
arrow/cpp/src/arrow/compute/key_hash.cc
Lines 385 to 387 in 7e286dd
| constexpr uint32_t max_batch_size = util::MiniBatch::kMiniBatchLength; | |
| auto hash_temp_buf = util::TempVectorHolder<uint32_t>(ctx->stack, max_batch_size); |
The holder use the max_batch_size which is 1024 as it's num_elements, it's far more than the temp stack's init buffer_size.
I know that the HashBatch is only used in hash-join or related codes. For join, they have already done line clipping at the upper level, ensuring that each input batch size is less_equal to kMiniBatchLength and the stack size is bigger enough.
But it can be used independently. So maybe we could use the num_rows rather than util::MiniBatch::kMiniBatchLength in HashBatch related apis?
Component(s)
C++