Skip to content

Conversation

@Marwes
Copy link

@Marwes Marwes commented Feb 10, 2020

This removes or corrects many instances of unsafe code in the rust crates. This is by no means a complete fix and some of the fixes do not entirely fix the issues with the particular unsafe (see comments), but in all instances it should put the code in a better place than before.

Based on #6256

@github-actions
Copy link

@Marwes
Copy link
Author

Marwes commented Feb 13, 2020

Removed the commits that changed the simd implemented operators. The code is definitely doing undefined behaviour though as the padding won't be initialized ( #6397 (comment) ).

So either the code could do non-simd operations for the last partial block (if any). Or the padding should be initialized before reading it.

@paddyhoran
Copy link
Contributor

So either the code could do non-simd operations for the last partial block (if any). Or the padding should be initialized before reading it.

I plan to initialize the padded region, ARROW-7836 if no-one beats me to it.

@paddyhoran
Copy link
Contributor

I'll try and review this tomorrow so we can get it merged.

@sunchao
Copy link
Member

sunchao commented Feb 15, 2020

Will take a look too. This needs rebase BTW. And cc @sadikovi .

@paddyhoran
Copy link
Contributor

Ping @Marwes can you re-base this so we can review if you get a chance please?

@andygrove
Copy link
Member

@Marwes Could you rebase this ?

@Marwes
Copy link
Author

Marwes commented Apr 1, 2020

Rebased, CI is failing though. Not sure what to do to fix.

@paddyhoran
Copy link
Contributor

This was fixed by #6800. Let me see if I can re-trigger CI.

@paddyhoran
Copy link
Contributor

@kszucs how can I re-trigger CI? I tried to force-push to this branch but it has no effect.

@wesm
Copy link
Member

wesm commented Apr 4, 2020

Not sure what happened with GitHub (they have been having outage issues?) but I rebased and CI is running again

@nevi-me
Copy link
Contributor

nevi-me commented Apr 4, 2020

CI is failing with unnecessary unsafe blocks on parquet\src\arrow\array_reader.rs

@paddyhoran
Copy link
Contributor

@Marwes looks like just some basic issues with this, want to take a quick look? I'd like to get this one merged.

@bluss
Copy link

bluss commented Apr 12, 2020

(I was "invited" to look at this PR, since I noticed some issues that needed fixing. All my comments will appear a bit random and semi-related to this PR, because of this. I will only comment on important issues.)

Nice, I had named 3 things I saw in arrow 0.16.0 code (not a full review)

Issue (1) about Buffer.ptr being null is being fixed here in this PR. Another common way to solve this in Rust is to use https://doc.rust-lang.org/std/ptr/struct.NonNull.html#method.dangling - an aligned non-null dummy pointer, but I won't review this solution further, Marwes can probably judge it all as well as I can. Using NonNull<T> is certainly an idea, too.
Issue (3) about making Buffer::from_raw_parts unsafe has been merged as a fix in an older PR.

Issue (2) so remains, no check for allocation failure. This produces dangerous mismatches - null pointer and nonzero length - in various places.

This doesn't have to be fixed in this PR, just to note that the issue is not fixed here.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix has disappeared, I can't see it in the diff anymore 😕

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened ARROW-8479

Markus Westerlind added 7 commits April 14, 2020 09:42
Might be other platforms this works on as well, but those could be added
as they are found
Fixes undefined behaviour that could occur from safe code calling with
`T == Box<i32>` etc.
This may cause panics in code using ByteArray or Int96 but no tests
currently test these paths. Still better than the status quo which would
be undefined behaviour (if the replaced code paths were hit).
@Marwes
Copy link
Author

Marwes commented Apr 14, 2020

Issue (1) about Buffer.ptr being null is being fixed here in this PR. Another common way to solve this in Rust is to use https://doc.rust-lang.org/std/ptr/struct.NonNull.html#method.dangling - an aligned non-null dummy pointer, but I won't review this solution further, Marwes can probably judge it all as well as I can. Using NonNull is certainly an idea, too.

That's the better solution. I just spotted the null pointer issue late in this PR and didn't want to complicate it further.

Fixed the review comments.

@bluss
Copy link

bluss commented Apr 14, 2020

Thank you @Marwes for doing big work like this for better and safer code

Copy link
Contributor

@paddyhoran paddyhoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Marwes for doing big work like this for better and safer code

Absolutely, thanks very much @Marwes! LGTM from Arrow (crate) perspective. @sunchao can you take a quick look at the changes to the Parquet crate please?

@wesm
Copy link
Member

wesm commented Apr 14, 2020

Would be great to get this into 0.17.0, probably needs to be merged today

cc @kszucs

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I only had time to skim through this but it'd be great if we have some perf numbers to show there is no regression.

@kszucs
Copy link
Member

kszucs commented Apr 15, 2020

@sunchao Included in the 0.17 release. If you find a significant regression, then we can still fix it by cutting a new release candidate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants