-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-6519: [Java] Use IPC continuation prefix as part of 8-byte EOS #5345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-6519: [Java] Use IPC continuation prefix as part of 8-byte EOS #5345
Conversation
| out.writeIntLittleEndian(0); | ||
| } else { | ||
| out.writeLongLittleEndian(0); | ||
| public static void writeEndOfStream(WriteChannel out, IpcOption option) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to keep this static because it is being used here to hide the EOS identifier https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala#L67
| out.writeLongLittleEndian(0); | ||
| public static void writeEndOfStream(WriteChannel out, IpcOption option) throws IOException { | ||
| if (!option.write_legacy_ipc_format) { | ||
| out.writeIntLittleEndian(MessageSerializer.IPC_CONTINUATION_TOKEN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also apply this change to ArrowFileWriter#endInternal and probably remove MessageSerializer#writeLongLitterEndian since it’s no use anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, I forgot that also writes EOS. done.
tianchen92
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @BryanCutler
|
Merging this. https://github.com/apache/arrow/tree/ARROW-6313-flatbuffer-alignment has a rebase conflict on apache/master, I'm going to try to fix |
4f9b887 to
0352456
Compare
Write IPC continuation token to file format EOS
edce1cd to
8fb38a0
Compare
This changes the 8-byte EOS for non-legacy stream format to use {0xFFFFFFFF, 0x00000000} instead of all zeros. When using all zeros, the reader will not know to read the last 4-bytes, but with the 4-byte continuation token, all bytes written to a channel can be read.
Closes #5345 from BryanCutler/java-ipc-cont-for-EOS-ARROW-6519 and squashes the following commits:
8fb38a0 <Bryan Cutler> Use IPC continuation token to write 8-byte EOS
Authored-by: Bryan Cutler <cutlerb@gmail.com>
Signed-off-by: Wes McKinney <wesm+git@apache.org>
|
Thanks @wesm and @tianchen92 ! |
This changes the 8-byte EOS for non-legacy stream format to use {0xFFFFFFFF, 0x00000000} instead of all zeros. When using all zeros, the reader will not know to read the last 4-bytes, but with the 4-byte continuation token, all bytes written to a channel can be read.
Closes #5345 from BryanCutler/java-ipc-cont-for-EOS-ARROW-6519 and squashes the following commits:
8fb38a0 <Bryan Cutler> Use IPC continuation token to write 8-byte EOS
Authored-by: Bryan Cutler <cutlerb@gmail.com>
Signed-off-by: Wes McKinney <wesm+git@apache.org>
This changes the 8-byte EOS for non-legacy stream format to use {0xFFFFFFFF, 0x00000000} instead of all zeros. When using all zeros, the reader will not know to read the last 4-bytes, but with the 4-byte continuation token, all bytes written to a channel can be read.
Closes apache#5345 from BryanCutler/java-ipc-cont-for-EOS-ARROW-6519 and squashes the following commits:
8fb38a0 <Bryan Cutler> Use IPC continuation token to write 8-byte EOS
Authored-by: Bryan Cutler <cutlerb@gmail.com>
Signed-off-by: Wes McKinney <wesm+git@apache.org>
This changes the 8-byte EOS for non-legacy stream format to use {0xFFFFFFFF, 0x00000000} instead of all zeros. When using all zeros, the reader will not know to read the last 4-bytes, but with the 4-byte continuation token, all bytes written to a channel can be read.
Closes apache#5345 from BryanCutler/java-ipc-cont-for-EOS-ARROW-6519 and squashes the following commits:
8fb38a0 <Bryan Cutler> Use IPC continuation token to write 8-byte EOS
Authored-by: Bryan Cutler <cutlerb@gmail.com>
Signed-off-by: Wes McKinney <wesm+git@apache.org>
This changes the 8-byte EOS for non-legacy stream format to use {0xFFFFFFFF, 0x00000000} instead of all zeros. When using all zeros, the reader will not know to read the last 4-bytes, but with the 4-byte continuation token, all bytes written to a channel can be read.