chore: update datafusion to 42.0 and arrow to 53.2#3176
chore: update datafusion to 42.0 and arrow to 53.2#3176jleibs wants to merge 17 commits intolance-format:mainfrom
Conversation
|
ACTION NEEDED The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. For details on the error please inspect the "PR Title Check" action. |
|
|
||
| [dev-dependencies] | ||
| substrait-expr = { version = "0.2.1" } | ||
| # TODO: This is too old |
There was a problem hiding this comment.
I believe somebody needs to also update substrait-expr to use prost 0.13 to match the version used by datafusion-substrait
There was a problem hiding this comment.
Guess that's on me. How do you feel about bumping all the way to datafusion 43? That release should include a change I made which will let us drop substrait-expr entirely.
There was a problem hiding this comment.
Version 0.2.2 of substrait-expr is now available and will include prost 0.13
There was a problem hiding this comment.
How do you feel about bumping all the way to datafusion 43
There were a couple more API surfaces that needed modifying so I was focusing on min-change possible, but I don't have a philosophiscal objection.
There was a problem hiding this comment.
Ok. Let's keep it minimal for now. I'll need to do some more work to get rid of substrait-expr anyways.
|
|
||
| [dev-dependencies] | ||
| substrait-expr = { version = "0.2.1" } | ||
| # TODO: This is too old |
There was a problem hiding this comment.
Guess that's on me. How do you feel about bumping all the way to datafusion 43? That release should include a change I made which will let us drop substrait-expr entirely.
| } | ||
|
|
||
| let buf = bytes.into(); | ||
| let buf = Buffer::from_vec(bytes.to_vec()); |
There was a problem hiding this comment.
This triggers a data copy (and could be the source of your alignment issues). What changed in arrow-rs to necessitate this change? It should still be possible to go from Bytes to Buffer with zero-copy somehow.
There was a problem hiding this comment.
Looks like this was the change:
Will try that approach instead.
There was a problem hiding this comment.
Based on the comment above I believe:
Buffer::from_vec(bytes.into()) is the intended pathway, though I also tried bytes.to_byte_slice().into(), but both still result in the same error:
---- index::vector::ivf::tests::test_create_ivf_hnsw_with_empty_partition stdout ----
thread 'index::vector::ivf::tests::test_create_ivf_hnsw_with_empty_partition' panicked at /home/jleibs/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-buffer-53.3.0/src/buffer/scalar.rs:133:42:
Memory pointer is not aligned with the specified scalar type
There was a problem hiding this comment.
That's still going to trigger a copy it seems. I'll take a look real quick.
There was a problem hiding this comment.
That's still going to trigger a copy it seems.
Is it?
My reads of both:
impl From<Bytes> for Vec<u8>Buffer::from_vec()
Look like they should both be zero-copy.
There was a problem hiding this comment.
Sorry, I saw bytes.to_byte_slice().into() which I think is not zero-copy. You are right that Buffer::from_vec(bytes.into()) should be zero-copy.
There was a problem hiding this comment.
Actually, no Buffer::from_vec(bytes.into()) will not be zero copy because bytes.into() is not zero-copy
There was a problem hiding this comment.
Ah, nevermind, it looks like bytes does have a zero-copy conversion to Vec<u8>. My bad.
There was a problem hiding this comment.
I'm guessing your fix was probably ok here. The root cause was probably just the deep_copy_buffer change.
There was a problem hiding this comment.
Aha. Those new routines are much more clear. Thanks.
e128098 to
08aa39b
Compare
|
With the changes to Buffer it looks like more tests are passing but now we get a failure in substrait: |
63fec42 to
8eb87ea
Compare
|
It looks like the substrait error can be fixed by: but I'm not sure if this hints at a bug or just a behavior change |
|
And the chain just keeps on going... looks like |
|
@jleibs this will work around the tfrecord while we wait for them to upgrade. It's a bit ugly but 🤷 |
…into jleibs/update_arrow_datafusion
|
@westonpace thanks! That's a helpful trick. Looks like the wheel is building now. There are some new deprecation warnings to be cleaned up related to bumping pyo3, but those should be straightforward. Any thoughts on the "x" vs "c0" business in substrait? |
If the tests in |
|
(we only use substrait for pushdown from duckdb / polars and that's what test_integration.py is checking) |
No luck. Integration tests show the same problem: |
|
Ok. I'll take a look tomorrow AM |
|
@westonpace we can probably remove tfrecord dependency from rust |
|
Sorry, lost track of this over Thanksgiving. I'll get it wrapped up today. |
|
rerun-io#2 (let me know if you want me to just open a new PR from my branch so I can skip the intermediate PRs) |
|
Closing as I've taken this over in #3201 |
|
@westonpace sorry I was out for an extended thanksgiving. Thanks for taking over. |
Changes:
impl<T: AsRef<[u8]>> From<T> for Buffer(see: explanation)BufferExtto centralize these conversions and make behavior more explicit.