Skip to content

Conversation

@tyrelr
Copy link
Contributor

@tyrelr tyrelr commented Jan 16, 2021

Using an approach similar to ARROW-10989, migrate typed array API's to use slices where they can.

This impacts the API of GenericBinaryArray<>, GenericListArray<>, GenericStringArray<>

This also enables bounds checking in every value() function on each of the above arrays, with an unsafe value_unchecked() provided for callers willing to do their own bounds-checking.

@github-actions
Copy link

@codecov-io
Copy link

codecov-io commented Jan 16, 2021

Codecov Report

Merging #9215 (dda0545) into master (ab5fc97) will increase coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9215      +/-   ##
==========================================
+ Coverage   81.89%   81.92%   +0.03%     
==========================================
  Files         215      215              
  Lines       52988    53070      +82     
==========================================
+ Hits        43392    43478      +86     
+ Misses       9596     9592       -4     
Impacted Files Coverage Δ
rust/arrow/src/array/array.rs 88.61% <100.00%> (ø)
rust/arrow/src/array/array_binary.rs 91.75% <100.00%> (+0.76%) ⬆️
rust/arrow/src/array/array_list.rs 93.84% <100.00%> (+0.78%) ⬆️
rust/arrow/src/array/array_primitive.rs 94.72% <100.00%> (+0.24%) ⬆️
rust/arrow/src/array/array_string.rs 95.37% <100.00%> (+1.25%) ⬆️
rust/arrow/src/array/builder.rs 85.10% <100.00%> (ø)
rust/arrow/src/compute/kernels/cast.rs 96.96% <100.00%> (-0.03%) ⬇️
rust/arrow/src/compute/kernels/limit.rs 100.00% <100.00%> (ø)
rust/arrow/src/compute/util.rs 98.92% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ab5fc97...dda0545. Read the comment docs.

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Great simplification and really called for :)

With this change, I think that we may be able to remove an unsafe in compute::kernels::length where we re-interpret the offset buffer as type Offset.

@alamb
Copy link
Contributor

alamb commented Jan 17, 2021

I apologize for the delay in merging Rust PRs -- the 3.0 release is being finalized now and are planning to minimize entropy by postponing merging changes not critical for the release until the process was complete. I hope the process is complete in the next few days. There is more discussion in the mailing list

@alamb
Copy link
Contributor

alamb commented Jan 20, 2021

@jorgecarleitao do you think this PR is ready to merge? Does it need any additional review or rebasing?

@jorgecarleitao
Copy link
Member

Ready to merge. Note that it has some backward incompatible changes, so PRs on top need to be merged carefully.

@alamb
Copy link
Contributor

alamb commented Jan 20, 2021

Roger -- I'll plan to merge it later today (after the current round of merges have gotten through CI)

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged this branch to apache/master locally and I got a compilation error. I think this is a logical conflict introduced with a bunch of the other PRs that have been merged over the last few days.

@tyrelr can you possibly rebase and fix the conflicts?

Thanks again for your help,
Andrew

(arrow_dev) alamb@MacBook-Pro:~/Software/arrow2/rust$ cargo test --all
   Compiling parquet v3.0.0-SNAPSHOT (/Users/alamb/Software/arrow2/rust/parquet)
   Compiling arrow v3.0.0-SNAPSHOT (/Users/alamb/Software/arrow2/rust/arrow)
error[E0599]: no method named `to_byte_slice` found for array `[{integer}; 10]` in the current scope
   --> arrow/src/array/array_list.rs:710:49
    |
710 |                 &[0, 1, 2, 3, 4, 5, 6, 7, 8, 9].to_byte_slice(),
    |                                                 ^^^^^^^^^^^^^ method not found in `[{integer}; 10]`
    |
    = help: items from traits can only be used if the trait is in scope
    = note: the following trait is implemented but not in scope; perhaps add a `use` for it:
            `use crate::datatypes::ToByteSlice;`

error[E0599]: no method named `to_byte_slice` found for array `[i64; 10]` in the current scope
   --> arrow/src/array/array_list.rs:717:62
    |
717 |             Buffer::from(&[0i64, 2, 2, 2, 4, 6, 6, 9, 9, 10].to_byte_slice());
    |                                                              ^^^^^^^^^^^^^ method not found in `[i64; 10]`
    |
    = help: items from traits can only be used if the trait is in scope
    = note: the following trait is implemented but not in scope; perhaps add a `use` for it:
            `use crate::datatypes::ToByteSlice;`

error[E0599]: no method named `to_byte_slice` found for array `[{integer}; 10]` in the current scope
   --> arrow/src/array/array_list.rs:807:49
    |
807 |                 &[0, 1, 2, 3, 4, 5, 6, 7, 8, 9].to_byte_slice(),
    |                                                 ^^^^^^^^^^^^^ method not found in `[{integer}; 10]`
    |
    = help: items from traits can only be used if the trait is in scope
    = note: the following trait is implemented but not in scope; perhaps add a `use` for it:
            `use crate::datatypes::ToByteSlice;`

   Compiling arrow-flight v3.0.0-SNAPSHOT (/Users/alamb/Software/arrow2/rust/arrow-flight)
error: aborting due to 3 previous errors

For more information about this error, try `rustc --explain E0599`.
error: could not compile `arrow`

To learn more, run the command again with --verbose.
warning: build failed, waiting for other jobs to finish...
error: build failed

@alamb alamb added the needs-rebase A PR that needs to be rebased by the author label Jan 20, 2021
@tyrelr tyrelr force-pushed the array_slice_accessors branch from 47086b8 to 5cd8075 Compare January 21, 2021 01:27
@tyrelr
Copy link
Contributor Author

tyrelr commented Jan 21, 2021

Rebased. But I didn't hit the merge conflicts during... I can see that a number of tests were swapped to use Buffer::from_slice_ref()... so I'll add another commit to switch to that function just to be safe.

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a performance concern.

Comment on lines 89 to 91
Copy link
Member

@jorgecarleitao jorgecarleitao Jan 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it, I am not sure we want this: this will have severe implications to almost all uses of value(). Could you run the benches before and after this PR?

My suggestion is that we mark this method as unsafe, so that it is clear that people need to be conscious about it. Since this will impact a large part of our code base (and likely others), I suggest that we do this change on a separate PR.

I think that we would like to instruct users of this API to switch to values(). This change will not trigger that change (as the code continues to run), but will have an amazing performance implication.

Think

for i in 0..array.len() {
    if array.value(i) > 2 {
        // do x
    };
}

this loop will suffer a lot from this change.

we would like users to change it to

array.values().for_each(|value| {
    if value > 2 {
        // do x
    };
});

For that, we need to mark array.value(i) as unsafe to indicate that yes, you can use that method, yes, it is fast, but you need to be careful about i.

Copy link
Contributor Author

@tyrelr tyrelr Jan 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, that we want to guide API users to switch to values(). My hesitancy with making/leaving it unsafe is that I think it's the only Array value(x) method when combined with the other changes in this PR (which is why it got added as a followon commit). That leads to awkwardness in macros, such as the comparison kernel, where it effectively calls "$left.value(x)". Making macro style invocation sometimes-need-unsafe is a bit unfortunate.

(There may be an argument that a trait could abstract away the need for macros altogether... but my experiments to rework the comparison kernel to that effect are incomplete - and partly sidelined with various other experiments I'm toying with... most of which somehow relate to removing direct calls to __Array.value(i) )

I would expect that in the exact for loop you mention, the compiler could recognize the satisfied loop bound since both len() and .value(i) are inlinable and use the same underlying field, but I expect it is also easy to complicate on that loop to defeat that by having a less direct relationship between i & array.len(). The pretty printing logic probably would do additional bounds checks now, as it iterates based on len() from the record batch, which is semantically equal, but nothing in the code enforces that in a way I would expect the compiler to understand.

I'll launch some benchmarks of head, master, and the commit just before the PrimitiveArray safety change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we may need to park this change until we migrate our code base to use values() whenever possible.

If we merge this one, we get a major performance degradation. If we add unsafe, we need to add unsafe in a lot of places. Neither are great options :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I'll rebase dropping that commit.

This comment was marked as outdated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that I covered all of those in #9291 . Good that there is no impact on sort, which was the only case where I could not optimize (i.e. we really needed the bound check).

The larger task atm is to convert all unit-tests that use value(i) to values()[i].

Fyi, the reason I propose keeping value(i) as unsafe instead of un-performant is that there is no way to efficiently access the value i without knowing the internals of the primitiveArray (i.e. pointer and offset). imo this method does serve the purpose of abstracting out those details. It just happens to be unsafe and we currently not marking it as such. However, please say if you disagree here.

wrt to the comparison kernel, we can create a generic for primitives alone for that use-case, no?

Let me know what do you think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#9291 is good progress towards eliminating use of the function.

And certainly, we could 'split' the macro for primitives as a quick fix to get rid of the call to the function. I've been experimenting with an alternative approach that might be a bit more flexible to multiple use cases, described at the bottom of this comment.

I am quite torn about whether I think value should or should not be in the interface.

Reasons to drop value(i) -> T::Native

I think that even if value(i) was dropped from the PrimitiveArray impl's, efficient random access to items without a bounds check can still be achieved through unsafe{*primitive_array.values().get_unchecked(i)} (the extra * because get_unchecked() returns a ref to the value).

I'm not sure I have any example code or measurements to demonstrate it on hand, but I am certain I saw the silently-unsafe implementationx.values().iter().zip(y.values().iter()) did (slightly) outperform (0..x.len()).map(|i|{x.value(i),y.value(i)}. I believe it was when I was playing with non-simd arithmetic kernels.... So that is the root of my hesitancy, is I'm worried it doesn't actually escape any overhead, and unintentionally lead people away from a more reliable/performant way. IF there is a context where unsafe{x.value(i)} beats the performance of unsafe{*x.values().get_unchecked(i)}

Reasons to keep value(i) -> T::Native

All other array implementations have value functions as far as I recall, so it is a nice 'consistency'.

In the back of my mind, the biggest argument to keep value(i) is for api consistency... so long term, a 'trait' may be the place where it might fit best? Very roughly, I'm thinking:

trait TypedArrowArray : ArrowArray {
   type RefType;
   fn is_valid(i:usize) -> bool; //bounds check
   unsafe fn is_valid(i:usize) -> bool; //no bounds check
   fn value(i:usize) -> RefType;  //bounds check
   unsafe fn value_unchecked(i:usize) -> RefType; //no bounds checked
   fn iter() -> impl Iterator<Option<RefType>>;
   fn iter_values() -> impl Iterator<RefType>;
}
impl <T: ArrowPrimitiveType> TypedArrowArray<&T::Native> for PrimitiveArray<T> { ... }
impl TypedArrowArray<ArrayRef> for GenericListArray<T> { ... }
//and similar for string/binary. ... I am not sure whether struct arrays could fit... Dictionary would not give access to 'keys', only to the values referenced by each key?  Union would require some kind of RefType that can downcast into the actual value?

Of course, I am uncertain how much overhead the 'standarization' such a trait impl implies would bring... would any kernels actually benefit from using generic implementations against such an api, or will they always go down to the concrete type to squeeze little short-cuts out that don't fit in the generic interface? I'm unsure, so very (very, very) slowly experimenting...

Summary

So in short, my thoughts are:

  • I think that leaving value(i) safety consideration out of this PR makes sense. I've rebased to drop that out - although I did leave the additional values() test code.
  • Marking it unsafe in the near future is absolutely better than being silently-unsafe. The argument that adding bounds-checks could silently impact external users is reasonable, taking unsafe has the larger 'warning' so that the change isn't missed.
  • Longer term, the options of deprecating it, or explicitly moving it into an trait impl are both contenders in my mind... but neither option is directly relevant to this PR.

Let me know if that seems reasonable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just looking over the benchmarks again, one of the biggest changes is actually a performance improvement while doing safe access... which makes very little sense... so I think I'll run the benchmarks again overnight.

@tyrelr tyrelr force-pushed the array_slice_accessors branch from 8181d2b to 41c29bc Compare January 22, 2021 06:36
@jorgecarleitao jorgecarleitao self-requested a review January 23, 2021 13:36
Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @tyrelr , I think that this is a major improvement to the API, reduces unsoundness, and makes it clear.

I left an optional comment, since we are touching the API.

@sunchao @nevi-me @alamb would you like to voice your opinion here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think of naming it simply offsets? I can't see the value of value_ on the name (same for the other types). Since we are touching the API in backward incompatible ways... xD

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had considered that, but was concerned that it may be confused with the buffer.offset... my hope was that it'd make it clear that it's an "offset into values()" (not into the buffer, which has the additional array-slicing offset).

I could go either way with it though?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with removing the value part - we should do it in another PR though.

@tyrelr

This comment has been minimized.

@tyrelr
Copy link
Contributor Author

tyrelr commented Jan 23, 2021

The 10% performance impact looks like mostly improvements:

critcmp master-499b6d0c array-slice-78f360b7c -t 10
group                      array-slice-78f360b7c                  master-499b6d0c
-----                      ---------------------                  ---------------
add 512                    1.00    282.9±0.41ns        ? B/sec    1.37    387.7±0.71ns        ? B/sec
add_nulls_512              1.00    344.7±0.54ns        ? B/sec    1.30    449.3±1.08ns        ? B/sec
bench_primitive            1.42      3.1±0.00ms  1279.3 MB/sec    1.00      2.2±0.01ms  1815.1 MB/sec
buffer_bit_ops or          1.00    590.9±0.95ns        ? B/sec    1.14    673.4±0.93ns        ? B/sec
cast int64 to int32 512    1.15      2.5±0.02µs        ? B/sec    1.00      2.2±0.01µs        ? B/sec
equal_bool_512             1.11     43.1±0.08ns        ? B/sec    1.00     38.8±0.06ns        ? B/sec
like_utf8 scalar equals    1.00     72.5±0.88µs        ? B/sec    1.22     88.4±0.18µs        ? B/sec
max nulls 512              1.00   1758.9±6.23ns        ? B/sec    1.31      2.3±0.01µs        ? B/sec
min string 512             1.00      3.4±0.00µs        ? B/sec    1.32      4.5±0.07µs        ? B/sec
multiply 512               1.00    335.6±0.67ns        ? B/sec    1.19    398.6±5.74ns        ? B/sec
mutable str 1024           1.00      2.7±0.00ms        ? B/sec    1.14      3.0±0.01ms        ? B/sec
subtract 512               1.00    345.3±1.26ns        ? B/sec    1.12    385.0±0.77ns        ? B/sec
take bool nulls 1024       1.15      4.2±0.03µs        ? B/sec    1.00      3.7±0.02µs        ? B/sec

I don't believe that the list/binary/string array would be involved in most of the benchmarks listed here? (multiply, subtract, take bool, add, add_nulls, bench_primitive should be operations on boolean/primitive arrays, which have no functional changes)...

@jorgecarleitao
Copy link
Member

@tyrelr , I believe that you may have to run the benches against the latest master vs your branch rebased, for consistency: 499b6d0 is before a significant change to the mutable buffer's performance. E.g. this code does not touch add, but there is a 40% difference in the benches (consistent with the PR that optimizes that kernel.

@tyrelr
Copy link
Contributor Author

tyrelr commented Jan 24, 2021

I've rebased locally and will be running benchmarks overnight.

@jorgecarleitao jorgecarleitao removed the needs-rebase A PR that needs to be rebased by the author label Jan 24, 2021
@tyrelr
Copy link
Contributor Author

tyrelr commented Jan 24, 2021

I'm still seeing a mix of inconsistent performance hits/bumps after the rebase.

critcmp master-67d0c2e38 array-slice-83b8938af -t 10
group                        array-slice-83b8938af                  master-67d0c2e38
-----                        ---------------------                  ----------------
array_slice 512              1.11    127.8±0.36ns        ? B/sec    1.00    115.4±0.15ns        ? B/sec
cast int64 to int32 512      1.00      2.3±0.00µs        ? B/sec    1.11      2.5±0.01µs        ? B/sec
concat i32 1024              1.00      2.5±0.00µs        ? B/sec    1.19      3.0±0.01µs        ? B/sec
equal_512                    1.14     46.7±0.18ns        ? B/sec    1.00     41.1±0.05ns        ? B/sec
like_utf8 scalar complex     1.00   1075.9±2.54µs        ? B/sec    1.16   1249.6±2.29µs        ? B/sec
like_utf8 scalar equals      1.00     70.8±0.08µs        ? B/sec    1.25     88.2±0.12µs        ? B/sec
min nulls string 512         1.14      6.5±0.04µs        ? B/sec    1.00      5.7±0.03µs        ? B/sec
min string 512               1.00      3.4±0.00µs        ? B/sec    1.28      4.3±0.01µs        ? B/sec
multiply 512                 1.38    346.6±0.37ns        ? B/sec    1.00    251.2±0.49ns        ? B/sec
nlike_utf8 scalar complex    1.00   1162.1±1.30µs        ? B/sec    1.12   1300.7±1.45µs        ? B/sec
take bool nulls 1024         1.00      3.7±0.03µs        ? B/sec    1.38      5.1±0.03µs        ? B/sec
take bool nulls 512          1.00   1720.3±8.69ns        ? B/sec    1.42      2.5±0.02µs        ? B/sec
take i32 512                 1.11   1023.5±1.61ns        ? B/sec    1.00    918.6±1.07ns        ? B/sec
take i32 nulls 512           1.00    989.6±1.31ns        ? B/sec    1.10   1089.6±2.06ns        ? B/sec

I won't push my rebase up unless we decide on some further tweaks to make, as it doesn't seem worth forcing a re-review since there were no conflicts.

@jorgecarleitao
Copy link
Member

@tyrelr , could you rebase against master?

@kszucs , something may have happened to the force push: the commits seem funny.

@kszucs
Copy link
Member

kszucs commented Jan 26, 2021

Could you please rebase on top of master?

…nctions providing a slice of the offsets.

Make value() functions check bounds to make them safe, add unsafe variants for callers willing to check bounds ahead of time.
@tyrelr tyrelr force-pushed the array_slice_accessors branch from e18e319 to dda0545 Compare January 27, 2021 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants