-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-6314: [C#] Implement IPC message format alignment changes, provide backwards compatibility and "legacy" option to emit old message format #5280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
eerhardt
wants to merge
2
commits into
apache:ARROW-6313-flatbuffer-alignment
from
eerhardt:ARROW-6313-csharp
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to perform an unaligned read from the specified buffer assuming a native byte ordering; shouldn't this be using BinaryPrimitives.ReadInt32LittleEndian?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was refactored from the original code, which you can see here:
arrow/csharp/src/Apache.Arrow/Ipc/ArrowStreamReader.cs
Lines 62 to 66 in 044b418
Originally, we always had a
byte[]and would callBitConverter.ToInt32. However, with the changes to allow for Memory and Span, I needed to make the same call, only with a Span instead ofbyte[]. This API exists in .NET, but it is not available innetstandard. So I needed to copy the little bit of code out into theBitUtilityclass.https://source.dot.net/#System.Private.CoreLib/shared/System/BitConverter.cs,269
You can see the
BitConverter.ToInt32(byte[])does the same operation.From what I can tell, the C++ implementation does the same thing:
(master branch)
arrow/cpp/src/arrow/ipc/message.cc
Line 240 in c3a6878
(ARROW-6313-flatbuffer-alignment branch)
arrow/cpp/src/arrow/util/ubsan.h
Lines 54 to 58 in 2d63975
I was never sure on this, and the spec doesn't 100% specify if these length numbers are big or little endian, or machine dependent. So that's why I've never changed this code, and left it doing what it has always been doing.
https://arrow.apache.org/docs/format/Layout.html#byte-order-endianness
Having the endianness inside of the schema doesn't help when you need to know what endian the schema length is in, in order to read the schema itself.
I see we are always writing little-endian numbers for these lengths, so maybe changing it here can be justified that way.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe since this issue has existed in this code since its inception, it would be best to open a JIRA issue for this.
https://issues.apache.org/jira/browse/ARROW-6553 - "[C#] Decide how to read message lengths - little-endian or machine dependent"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eerhardt I'll continue the discussion in that JIRA issue; I interpreted the "little-endian by default" section to mean that the IPC protocol is always little-endian, but that array primitives have a byte order corresponding to the (optional) schema metadata value. If the protocol specification does not specify byte ordering or a mechanism for determining byte ordering, I would think to view that as an oversight; however, it can also just mean the C++ code is presently non-compliant or does not support such endian-awareness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The C++ implementation is not big-endian compliant. Even finding environments to do big endian testing nowadays is a major challenge.