Skip to content

Conversation

@zanmato1984
Copy link
Contributor

@zanmato1984 zanmato1984 commented Jan 12, 2024

Rationale for this change

#39583 is a subsequent issue of #32570 (fixed by #39234). The last issue and fixed only resolved var length types. It turns out fixed size types have the same issue.

What changes are included in this PR?

Do the same fix of #39234 for fixed size types.

Are these changes tested?

UT included.

Are there any user-facing changes?

@zanmato1984
Copy link
Contributor Author

Hi @pitrou , I wonder if you could help to review this PR as this is very similar to its predecessor #39234 reviewed by you. Much appreciate it!

@mapleFU mapleFU requested a review from pitrou January 12, 2024 17:23
Comment on lines -391 to -393
if (column_metadata.fixed_length == 0) {
num_rows_left = std::max(num_rows_left, 8) - 8;
++num_bytes_skipped;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was for boolean data, right? Is it ok to remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Fixed size types of length 0/1/2/4/8 are using element-wise copying, i.e., no "word-wise then tail-bytes" copying. They don't go to this method:

case 0:
CollectBits(source->buffers[1]->data(), source->offset, target->mutable_data(1),
num_rows_before, num_rows_to_append, row_ids);
break;
case 1:
Visit(source, num_rows_to_append, row_ids,
[&](int i, const uint8_t* ptr, uint32_t num_bytes) {
target->mutable_data(1)[num_rows_before + i] = *ptr;
});
break;
case 2:
Visit(
source, num_rows_to_append, row_ids,
[&](int i, const uint8_t* ptr, uint32_t num_bytes) {
reinterpret_cast<uint16_t*>(target->mutable_data(1))[num_rows_before + i] =
*reinterpret_cast<const uint16_t*>(ptr);
});
break;
case 4:
Visit(
source, num_rows_to_append, row_ids,
[&](int i, const uint8_t* ptr, uint32_t num_bytes) {
reinterpret_cast<uint32_t*>(target->mutable_data(1))[num_rows_before + i] =
*reinterpret_cast<const uint32_t*>(ptr);
});
break;
case 8:
Visit(
source, num_rows_to_append, row_ids,
[&](int i, const uint8_t* ptr, uint32_t num_bytes) {
reinterpret_cast<uint64_t*>(target->mutable_data(1))[num_rows_before + i] =
*reinterpret_cast<const uint64_t*>(ptr);
});
break;

So I think we can simplify this method by not specially dealing with boolean type, thus the added ARROW_DCHECK several lines above.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jan 17, 2024
@pitrou
Copy link
Member

pitrou commented Jan 17, 2024

@zanmato1984 Can you rebase on git main?

@zanmato1984
Copy link
Contributor Author

@zanmato1984 Can you rebase on git main?

Sure. Done.

@pitrou
Copy link
Member

pitrou commented Jan 17, 2024

@github-actions crossbow submit -g cpp

@github-actions
Copy link

Revision: 20d9a38

Submitted crossbow builds: ursacomputing/crossbow @ actions-56dd6fe473

Task Status
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind Azure
test-cuda-cpp GitHub Actions
test-debian-11-cpp-amd64 GitHub Actions
test-debian-11-cpp-i386 GitHub Actions
test-fedora-38-cpp GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-20.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-20.04-cpp-thread-sanitizer GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @zanmato1984 . The original code quality is quite bad but this improves things slightly.

@pitrou pitrou merged commit 1dc3b81 into apache:main Jan 17, 2024
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Jan 17, 2024
@pitrou
Copy link
Member

pitrou commented Jan 17, 2024

@raulcd If you do another RC, this would be a good candidate fix to add.

@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 1dc3b81.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them.

dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…g consecutive tail rows with the same id may exceed buffer boundary (for fixed size types) (apache#39585)

### Rationale for this change

apache#39583 is a subsequent issue of apache#32570 (fixed by apache#39234). The last issue and fixed only resolved var length types. It turns out fixed size types have the same issue.

### What changes are included in this PR?

Do the same fix of apache#39234 for fixed size types.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

* Closes: apache#39583

Authored-by: zanmato1984 <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
raulcd pushed a commit that referenced this pull request Feb 20, 2024
…ecutive tail rows with the same id may exceed buffer boundary (for fixed size types) (#39585)

### Rationale for this change

#39583 is a subsequent issue of #32570 (fixed by #39234). The last issue and fixed only resolved var length types. It turns out fixed size types have the same issue.

### What changes are included in this PR?

Do the same fix of #39234 for fixed size types.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

* Closes: #39583

Authored-by: zanmato1984 <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++] Segmentation fault in ExecBatchBuilder

2 participants