-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-5908: [C#] ArrowStreamWriter doesn't align buffers to 8 bytes #4851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| public readonly ArrowBuffer DataBuffer; | ||
| public readonly int Offset; | ||
| public readonly int Length; | ||
| public readonly int Padding; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this actually need to be stored in Buffer? Seems like the call site in WriteRecordBatchInternalAsync could just compute this value instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be computed in WriteRecordBatchInternalAsync , but then computing the padding (i.e. calling RoundUpToMultipleOf8) would need to be in both places.
It is needed when creating Buffer because we need to increment the TotalLength to include the padding. That way the next Buffer's Offset is always aligned to 8 bytes.
In reply to: 302357499 [](ancestors = 302357499)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but that computation is cheap, and it looks like this increases the size of the buffer, which could be more expensive overall. Something to think about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a quick benchmark for ArrowStreamWriter, and noticed that I can remove ~10% of the allocated memory during Write by making this struct smaller - I could also remove the Length field. The running time is not affected. So I went forward with the change, thanks for the suggestion.
Original proposal
| Method | BatchLength | ColumnSetCount | Mean | Error | StdDev | Gen 0/1k Op | Allocated Memory/Op |
|---|---|---|---|---|---|---|---|
| WriteBatch | 10000 | 10 | 832.5 us | 52.00 us | 151.7 us | - | 44.38 KB |
| WriteBatch | 10000 | 25 | 1,776.1 us | 61.54 us | 178.5 us | - | 121.09 KB |
| WriteBatch | 1000000 | 10 | 61,126.2 us | 1,162.39 us | 1,141.6 us | - | 44.38 KB |
| WriteBatch | 1000000 | 25 | 150,407.0 us | 2,938.05 us | 3,383.5 us | - | 121.09 KB |
With new changes
| Method | BatchLength | ColumnSetCount | Mean | Error | StdDev | Gen 0/1k Op | Allocated Memory/Op |
|---|---|---|---|---|---|---|---|
| WriteBatch | 10000 | 10 | 827.4 us | 47.40 us | 138.3 us | - | 40.41 KB |
| WriteBatch | 10000 | 25 | 1,758.5 us | 60.13 us | 176.3 us | - | 105.13 KB |
| WriteBatch | 1000000 | 10 | 61,222.3 us | 1,391.92 us | 1,429.4 us | - | 40.41 KB |
| WriteBatch | 1000000 | 25 | 149,516.4 us | 1,198.86 us | 936.0 us | - | 105.13 KB |
|
@ursabot --help |
|
|
@ursabot build |
Yes, sounds good |
Recalculate padding instead of allocating more memory.
|
I'm not sure why the Rust leg is failing. I am pretty sure it isn't caused by my change. |
imback82
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @eerhardt!
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Ensure 8-byte alignment on each buffer in a RecordBatch as specified in https://arrow.apache.org/docs/format/Layout.html#requirements-goals-and-non-goals >It is required to have all the contiguous memory buffers in an IPC payload aligned at 8-byte boundaries. In other words, each buffer must start at an aligned 8-byte offset. Additionally, each buffer should be padded to a multiple of 8 bytes. /cc @pgovind @stephentoub @imback82 @wesm - If possible, can we also include this patch in the next release (0.14.1 or 0.15.0)? We hit this issue trying to update .NET for Apache Spark to the latest Arrow release - dotnet/spark#167. Author: Eric Erhardt <eric.erhardt@microsoft.com> Closes #4851 from eerhardt/FixWriterPadding and squashes the following commits: 76807e9 <Eric Erhardt> PR feedback 7ecda78 <Eric Erhardt> Ensure 8-byte alignment on each buffer in a RecordBatch.
Ensure 8-byte alignment on each buffer in a RecordBatch as specified in https://arrow.apache.org/docs/format/Layout.html#requirements-goals-and-non-goals
/cc @pgovind @stephentoub @imback82
@wesm - If possible, can we also include this patch in the next release (0.14.1 or 0.15.0)? We hit this issue trying to update .NET for Apache Spark to the latest Arrow release - dotnet/spark#167.