Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 24 additions & 10 deletions cpp/src/arrow/compute/exec.cc
Original file line number Diff line number Diff line change
Expand Up @@ -308,16 +308,30 @@ bool ExecBatchIterator::Next(ExecBatch* batch) {
// Now, fill the batch
batch->values.resize(args_.size());
batch->length = iteration_size;
for (size_t i = 0; i < args_.size(); ++i) {
if (args_[i].is_scalar()) {
batch->values[i] = args_[i].scalar();
} else if (args_[i].is_array()) {
batch->values[i] = args_[i].array()->Slice(position_, iteration_size);
} else {
const ChunkedArray& carr = *args_[i].chunked_array();
const auto& chunk = carr.chunk(chunk_indexes_[i]);
batch->values[i] = chunk->data()->Slice(chunk_positions_[i], iteration_size);
chunk_positions_[i] += iteration_size;

if (iteration_size == length_) {
ARROW_DCHECK_EQ(position_, 0);
for (size_t i = 0; i < args_.size(); ++i) {
if (args_[i].kind() == Datum::CHUNKED_ARRAY) {
const ChunkedArray& carr = *args_[i].chunked_array();
batch->values[i] = Datum(carr.chunk(chunk_indexes_[i])->data());
chunk_positions_[i] += iteration_size;
} else {
batch->values[i] = std::move(args_[i]);
}
Comment on lines +315 to +321
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (args_[i].kind() == Datum::CHUNKED_ARRAY) {
const ChunkedArray& carr = *args_[i].chunked_array();
batch->values[i] = Datum(carr.chunk(chunk_indexes_[i])->data());
chunk_positions_[i] += iteration_size;
} else {
batch->values[i] = std::move(args_[i]);
}
if (args_[i].kind() == Datum::CHUNKED_ARRAY) {
chunk_positions_[i] += iteration_size;
}
batch->values[i] = std::move(args_[i]);

I'm not entirely certain this is correct but if iteration_size == length_ I think that means you are guaranteed that any chunked arrays are a single chunk (or at least, there is only one non-zero size chunk) and so you are consuming it all at once. I'm not even sure you need to update chunk_positions_[i].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right about not having to update chunk_positions_[i]. But the suggested change wouldn't work. Moving the chunked array Datum directly into batch->values means it ends up getting passed to the kernel, which usually expects an ARRAY type Datum. The important part why we need a branch here is extracting the ArrayData from the chunked array

}
} else {
for (size_t i = 0; i < args_.size(); ++i) {
if (args_[i].is_scalar()) {
batch->values[i] = args_[i].scalar();
} else if (args_[i].is_array()) {
batch->values[i] = args_[i].array()->Slice(position_, iteration_size);
} else {
const ChunkedArray& carr = *args_[i].chunked_array();
const auto& chunk = carr.chunk(chunk_indexes_[i]);
batch->values[i] = chunk->data()->Slice(chunk_positions_[i], iteration_size);
chunk_positions_[i] += iteration_size;
}
}
}
position_ += iteration_size;
Expand Down