-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Replace ArrayData::new() with ArrayData::try_new() and unsafe ArrayData::new_unchecked
#822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
arrow/src/array/data.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#810 has an initial set of checks
arrow/src/array/data.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API changes in data.rs are the core changes in this PR -- everything else is a mechanical changes to use the new APIs.
Codecov Report
@@ Coverage Diff @@
## master #822 +/- ##
==========================================
+ Coverage 82.54% 82.56% +0.02%
==========================================
Files 168 168
Lines 47910 47988 +78
==========================================
+ Hits 39545 39622 +77
- Misses 8365 8366 +1
Continue to review full report at Codecov.
|
Dandandan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the changes look good, thanks a lot for investing the time in this @alamb
I think we should have some notes around the usages of unsafe: why is it sound in this place?
arrow/benches/array_from_vec.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a safety note here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be helpful to note that this is (very old) benchmark crate seemingly last modified by @andygrove and @sunchao https://github.com/apache/arrow-rs/blame/master/arrow/benches/array_from_vec.rs#L29-L38
To be honest it does not look safe to me. I will try and rewrite it
It is a good idea @Dandandan -- I will attempt to do so. To be honest I am not sure why all the references are legitimate uses of I don't think this PR has made the code any more or less safe / unsafe than it was before. However, it is now clearer where assumptions are made (they are all annotated with |
houqp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me as well and I agree with @Dandandan that we should add safety notes.
@Dandandan and @houqp -- First I want to emphasize that this PR does not change the safety of the arrow-rs implementation -- the code is as safe/unsafe before this PR as it is after this PR. I agree that all However, I propose not requiring such annotations for this PR because:
Thus, I propose a multi-pronged approach:
I will admit that part of my reason for not wanting to try and annotate all uses of |
|
I agree with you @alamb 👍 |
|
Tried adding some safety notes, I have to agree that the task is quite intense :D I also suggest we focus on filling gaps in arrow2 instead of retrofitting arrow-rs, it's a huge undertaking that @jorgecarleitao has already tried and decided that the effort is not worth it. |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Dandandan -- are you OK with merging this PR given the rationale listed in #822 (comment) (you have marked the PR as changes requested)
@jhorstmann are you OK with this PR?
I would like to merge this in and then create an arrow 6.0.0 release candidate (and hopefully unblock the next downstream release of DataFusion)
I plan to make ArrayData::try_new() safer with additional validation (released as part of 6.1.0)
|
I agree with your points. The PR already improves on the current state ( |
Hehe 😃 I was just looking at it. Yeah merging as is would be great. |
nevi-me
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also looked at it, and agree with merging as is
|
Given the feedback I plan to rebase this and merge it in |
d54bd7d to
f85fff6
Compare
|
Thanks @alamb! I was away from a computer for a few days, this looks good! |
Which issue does this PR close?
Part of #817
Rationale for this change
This PR is a step towards making arrow-rs Rust
safeand resolving open RUSTSEC issues.ArrayData::new()is fundamentallyunsafe(in the Rust sense) as it relies on the user to pass in valid data or else allows undefined behavior. The API is easy to misuse and should be marked asunsafeto reflect this. See Validate arguments to ArrayData::try_new() #817 for more background.Builds on @jhorstmann 's work in #813
What changes are included in this PR?
ArrayData::new()unsafe ArrayData::new_unchecked()andArrayData::try_new()ArrayDataBuilder::build()fallibleunsafe ArrayDataBuilder::build_unchecked()Note:
** Splitting the changes into several PRs I think will help with reviews
** I would like to ensure the API changes are included it arrow-rs 6.0 (planning to make a release candidate in the next week or so). We can then add additional validation in 6.1, 6.2, etc as they will be non breaking API changes.
Are there any user-facing changes?
Yes -- the APIs for creating ArrayData are different. This should not affect any users who create Arrays directly, only those using the lower level APIs.