Skip to content

Conversation

@nevi-me
Copy link
Contributor

@nevi-me nevi-me commented Sep 12, 2020

Changes introduced in 0.15.0 changed the buffer alignment by adding a continuation marker to messages.
The previous behaviour was then marked as legacy, while both worked under V4 of the IPC metadata version.

This change catches the Rust implementation up to other languages, and has the consequence that more integration tests now pass.

The change is applied on top of clippy changes, and can be reviewed/merged after the relevant PRs.

@nevi-me
Copy link
Contributor Author

nevi-me commented Sep 12, 2020

@maxburke this will likely affect you, please have a look at it before the next release if you can. The Rust writer wasn't writing continuation markers consistently, and this is fixed. Readers should be able to continue reading, but it's worth testing out with your workloads.

@github-actions
Copy link

pub use self::gen::SparseTensor::*;
pub use self::gen::Tensor::*;

static ARROW_MAGIC: [u8; 6] = [b'A', b'R', b'R', b'O', b'W', b'1'];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is const better than static here? IIRC I used static for ARROW_MAGIC when const either wasn't a thing yet, or I was unfamiliar with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I think this can be const.

@nevi-me
Copy link
Contributor Author

nevi-me commented Sep 13, 2020

I'll look into the integration failures during the week, but it's C++ null arrays. I noticed similar failures when I was trying to read a null array from pyarrow. Seems I'm likely not understanding the spec correctly (odd that C++ is the only failure when Java and others pass).

@wesm
Copy link
Member

wesm commented Sep 13, 2020

In C++ we are more hardened against metadata integrity issues from all the IPC fuzzing, e.g.

https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/metadata_internal.cc#L747

According to the Flatbuffers spec, this field should be length-0, not null.

@nevi-me
Copy link
Contributor Author

nevi-me commented Sep 13, 2020

Thanks Wes, from looking at the cpp source where the error occurs, the 0-length case makes me wonder if this might be a fbs issue in Rust. I'll have a look at whether the Rust impl distinguishes between a null and empty array value.
Might have to set a value then remove it (if possible) to ensure that there's an empty array.

Children should not be None/null, but should be an empty list
@nevi-me
Copy link
Contributor Author

nevi-me commented Sep 13, 2020

@wesm I found the problem, I was setting children as null instead of an empty list, so not a library issue. Thanks again

@nevi-me
Copy link
Contributor Author

nevi-me commented Sep 15, 2020

@andygrove @paddyhoran @sunchao the changes here still work with the 0.14.x golden files, and 0.15+ as integration tests pass. Would you like to perform an in-depth review, or can I merge the changes? I have a follow-up that enables more tests, which I'd like to try wrap up over/before the weekend

Copy link
Contributor

@paddyhoran paddyhoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't do an in-depth review but from a high level this looks good. Let's merge this and keep the momentum going. Great job @nevi-me

pub use self::gen::SparseTensor::*;
pub use self::gen::Tensor::*;

static ARROW_MAGIC: [u8; 6] = [b'A', b'R', b'R', b'O', b'W', b'1'];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I think this can be const.

@maxburke
Copy link
Contributor

@maxburke this will likely affect you, please have a look at it before the next release if you can. The Rust writer wasn't writing continuation markers consistently, and this is fixed. Readers should be able to continue reading, but it's worth testing out with your workloads.

Thanks for the heads-up, @nevi-me, we'll give it a solid runthrough the next time we pull.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants