-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-35498: [C++] Fix source node batch realignment #35541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or In the case of PARQUET issues on JIRA the title also supports: See also: |
|
|
|
@ursabot please benchmark |
|
Benchmark runs are scheduled for baseline = 1a038ad and contender = 892e400. Results will be available as each benchmark for each run completes. |
|
I see a bunch of CI job (here, here, here and here) are failing due to OOM, presumably on realignment, but the allocation sizes are not large. More concerning is this failure, which shows an invalid alignment of 3. Presumably, the data type of the value being realigned is some kind of struct. @westonpace, what are your thoughts? Perhaps there is an API (or idiom) to get the byte width of the largest field of the data type, which I think would be the correct alignment in such a case? |
|
There is one CI job with unexpected results, which appear to be unrelated to this PR, but are worth noting. There is a second CI job which segfaults, and it's not clear whether they are related. @westonpace, have you seen these segfaults before? |
|
Structs do not have any buffers themselves beyond the validity buffer. The validity buffer will have a bit width of 1. I don't think we need to worry about aligning any buffers with a width of 8 or less (e.g. only worry about it once the width is 16) One problem with this PR is that it doesn't address offsets buffers. For example, in a string / list / binary array there is a 32-bit offsets buffer which should be aligned to 32. These arrays won't show up as fixed width arrays. I was working on this a bit yesterday as well. I ended up with something like this... It's not pretty. I also tried using a visitor pattern but it was ending up to be more complicated (though arguably a bit more future proof were we to add more fixed-width types). |
|
I was thinking along the lines of your work; you're clearly ahead of me on this.
We do. In failures I observed internally, the cause was misalignment of buffer addresses (which were even numerically odd) that were expected to be 4- or 8-byte aligned. I think the switch-statement should be refactored into I think the correct idea here is that alignment applies to buffers, not arrays. That is, the format of the buffer is what determines the required alignment, if any, and I guess this format is determined by the buffer index within the array and the array's storage type. @westonpace, should you or I take this forward? |
Sorry, I meant 8bits or less (e.g. we can safely assume that everything has at least 1 byte of alignment)
Yes, unfortunately.
I think I should have time to make a PR today. Sorry to steal a task, I should've marked yesterday that I was working on this. |
Agreed. I included this condition in my recent commit here.
I'd suggest considering adding something like
OK, you owe me one :) |
|
Also, consider using |
|
['Python', 'R'] benchmarks have high level of regressions. |
|
Is this obsoleted by #35565? |
Yes. Closing this. |
Rationale for this change
Currently, source node uses too high a value for realignment, which caused a performance degradation.
What changes are included in this PR?
The source node batch realigns an array to its byte width (if it is more than 1).
Are these changes tested?
The changes are tested for correctness by the existing tests. The performance is checked by regression jobs.
Are there any user-facing changes?
Only performance.
This PR contains a "Critical Fix".