Skip to content

Conversation

@jorgecarleitao
Copy link
Member

@jorgecarleitao jorgecarleitao commented Jan 22, 2021

Problem

Currently, a lot of our code relies on PrimitiveArray::value(usize), however, that method is unsafe (but not marked as such!): any usize larger than the length of the array allows to read arbitrary memory regions due to the lack of bound checks.

This PR:

This PR changes some of our kernels to not rely on it for this operation, replacing by a safe alternative. This PR is expected to affect the performance of the touched kernels, either improving as we already have alternatives to efficiently create an array out of an iterator, or decreasing (when the index bound is indeed unknown).

todo: benchmark

@github-actions
Copy link

@jorgecarleitao
Copy link
Member Author

cc @tyrelr .

Comment on lines +71 to +73
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be slower, but at this point we have no guarantee that i and j are within bounds.

@codecov-io
Copy link

codecov-io commented Jan 22, 2021

Codecov Report

Merging #9291 (032b2ba) into master (437c8c9) will increase coverage by 0.00%.
The diff coverage is 97.67%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #9291   +/-   ##
=======================================
  Coverage   81.88%   81.89%           
=======================================
  Files         215      215           
  Lines       52988    52972   -16     
=======================================
- Hits        43391    43379   -12     
+ Misses       9597     9593    -4     
Impacted Files Coverage Δ
rust/arrow/src/array/equal_json.rs 92.16% <85.71%> (-0.26%) ⬇️
rust/arrow/src/array/array_primitive.rs 94.48% <100.00%> (ø)
rust/arrow/src/array/ord.rs 62.50% <100.00%> (+1.07%) ⬆️
rust/arrow/src/compute/kernels/cast.rs 97.20% <100.00%> (+0.21%) ⬆️
rust/parquet/src/encodings/encoding.rs 95.43% <0.00%> (+0.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 437c8c9...032b2ba. Read the comment docs.

@nevi-me nevi-me changed the title ARROW-11345: Made most ops not rely on value(i) ARROW-11345: [Rust] Made most ops not rely on value(i) Jan 22, 2021
@nevi-me nevi-me self-requested a review February 1, 2021 15:05
Copy link
Contributor

@nevi-me nevi-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgecarleitao this needs a rebase

}

Ok(Arc::new(b.finish()) as ArrayRef)
// todo: can be optimized by computing this for the whole values buffer and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can open a JIRA to track this. I think 'unary' can also work (or was this before the cast improvement?)

@nevi-me nevi-me added the needs-rebase A PR that needs to be rebased by the author label Feb 13, 2021
@alamb
Copy link
Contributor

alamb commented Mar 29, 2021

@jorgecarleitao is this PR something that you plan to clean up and merge? Or should we close this PR as you have switched to working on arrow 2?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: Rust needs-rebase A PR that needs to be rebased by the author

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants