-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-972: UnionArray in pyarrow #1216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can you rebase? What more work needs to be done here? |
|
@pcmoritz anything I can do to help on this? Would be great to get this into 0.8.0 |
|
@wesm: Agreed it should be part of 0.8.0. I'll take a stab at the remaining items now and let you know how things go. |
df611cb to
fc73439
Compare
|
This is now ready to review |
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some minor comments about names and a few other things. Thank you for doing this!
cpp/src/arrow/array.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be better with the type on the LHS (std::vector<std::shared_ptr<Field>> types;)
cpp/src/arrow/array.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be useful as an alternate version of arrow::union_? Maybe with the same API (the mode as the last argument) but omitting the type codes
cpp/src/arrow/array.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May also want to assert that value_offsets has 0 null count
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
cpp/src/arrow/array.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this is naming similar to ListArray::FromArrays, but because there are two versions, I wonder if calling this UnionArray::MakeDense (or FromArraysDense, more verbose) would be clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
cpp/src/arrow/array.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't initially clear that this is the child type, maybe call this child_type instead (and add doxygen comment)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now gone and using the Array's type to get this information. It's more consistent with how things are done for ListArray and StructArray.
python/pyarrow/scalar.pxi
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it true that using an underscore saves you from declaring this method in the pxd file? If so I've been wasting my time a bunch in the past :)
python/pyarrow/scalar.pxi
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite expensive on a per-value basis. Could these wrapped types be accessed from the parent pyarrow UnionArray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, good point! I'm constructing the Python objects for children types once per array in the UnionType.
296de0a to
be13b58
Compare
python/pyarrow/types.pxi
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't do this and plug mode in directly below, clang complains:
[ 33%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o
/Users/pcmoritz/arrow/python/build/temp.macosx-10.7-x86_64-3.5/lib.cxx:100873:100: error: invalid operands to binary expression ('enum arrow::UnionMode' and 'int')
return (enum arrow::UnionMode) (((((enum arrow::UnionMode)digits[1]) << PyLong_SHIFT) | (enum arrow::UnionMode)digits[0]));
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~
/Users/pcmoritz/arrow/python/build/temp.macosx-10.7-x86_64-3.5/lib.cxx:100882:102: error: invalid operands to binary expression ('enum arrow::UnionMode' and 'int')
return (enum arrow::UnionMode) (((((((enum arrow::UnionMode)digits[2]) << PyLong_SHIFT) | (enum arrow::UnionMode)digits[1]) << PyLong_SHIFT) ...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~
/Users/pcmoritz/arrow/python/build/temp.macosx-10.7-x86_64-3.5/lib.cxx:100891:104: error: invalid operands to binary expression ('enum arrow::UnionMode' and 'int')
return (enum arrow::UnionMode) (((((((((enum arrow::UnionMode)digits[3]) << PyLong_SHIFT) | (enum arrow::UnionMode)digits[2]) << PyLong_SHIFT...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~
/Users/pcmoritz/arrow/python/build/temp.macosx-10.7-x86_64-3.5/lib.cxx:100929:130: error: invalid operands to binary expression ('enum arrow::UnionMode' and 'int')
return (enum arrow::UnionMode) (((enum arrow::UnionMode)-1)*(((((enum arrow::UnionMode)digits[1]) << PyLong_SHIFT) | (enum arrow::UnionMode)d...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~
/Users/pcmoritz/arrow/python/build/temp.macosx-10.7-x86_64-3.5/lib.cxx:100938:101: error: invalid operands to binary expression ('enum arrow::UnionMode' and 'int')
return (enum arrow::UnionMode) ((((((enum arrow::UnionMode)digits[1]) << PyLong_SHIFT) | (enum arrow::UnionMode)digits[0])));
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~
/Users/pcmoritz/arrow/python/build/temp.macosx-10.7-x86_64-3.5/lib.cxx:100947:132: error: invalid operands to binary expression ('enum arrow::UnionMode' and 'int')
return (enum arrow::UnionMode) (((enum arrow::UnionMode)-1)*(((((((enum arrow::UnionMode)digits[2]) << PyLong_SHIFT) | (enum arrow::UnionMode...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~
/Users/pcmoritz/arrow/python/build/temp.macosx-10.7-x86_64-3.5/lib.cxx:100956:103: error: invalid operands to binary expression ('enum arrow::UnionMode' and 'int')
return (enum arrow::UnionMode) ((((((((enum arrow::UnionMode)digits[2]) << PyLong_SHIFT) | (enum arrow::UnionMode)digits[1]) << PyLong_SHIFT) ...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~
/Users/pcmoritz/arrow/python/build/temp.macosx-10.7-x86_64-3.5/lib.cxx:100965:134: error: invalid operands to binary expression ('enum arrow::UnionMode' and 'int')
return (enum arrow::UnionMode) (((enum arrow::UnionMode)-1)*(((((((((enum arrow::UnionMode)digits[3]) << PyLong_SHIFT) | (enum arrow::UnionMo...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~
/Users/pcmoritz/arrow/python/build/temp.macosx-10.7-x86_64-3.5/lib.cxx:100974:105: error: invalid operands to binary expression ('enum arrow::UnionMode' and 'int')
return (enum arrow::UnionMode) ((((((((((enum arrow::UnionMode)digits[3]) << PyLong_SHIFT) | (enum arrow::UnionMode)digits[2]) << PyLong_SHIFT...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~
9 errors generated.
2007736 to
02bdfc8
Compare
|
@wesm This is ready except for the Windows compile error, do you think you could take a quick look at that? I'm not sure what is going on. |
|
|
Change-Id: Iae3f69070e595a7b65689dda1197c749935fe4b5
Change-Id: Id4b6f445f3e041633eefa327eb1c4716d7d9b18a
02bdfc8 to
7f3ca31
Compare
|
Done. Will merge this once the build is green |
|
Great, thanks :) |
|
Thanks @pcmoritz! |
This is taking a stab at exposing UnionArray to pyarrow. Tasks to be done: