-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-2135: [Python] Fix NaN conversion when casting from Numpy array #1681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-2135: [Python] Fix NaN conversion when casting from Numpy array #1681
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason (macro expansion?) these #ifs wouldn't work correctly here, even though NPY_INT64 is defined to NPY_LONG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, actually, that must be because NPY_LONGLONG is not a macro...
6cbf133 to
d602be7
Compare
84766bc to
cd37393
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::fill(null_bitmap_data_, null_bitmap_data_ + null_bytes, 0) is a bit more idiomatic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, perhaps. This is really a copy/paste of NumPyConverter::InitNullBitmap()...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly time for a subclass then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there already a test for things like a = [1.0, 2.0, 3.1, np.nan] where a user passes in an integer type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean for the truncation behavior? Let me look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't think so. I'm not sure we specify the truncation mode anywhere either?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it's a hard cast:
In [7]: pa.array([1, 2, 3.190, np.nan], type=pa.int64())
Out[6]:
<pyarrow.lib.Int64Array object at 0x7f537e42dd68>
[
1,
2,
3,
NA
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine. Was just wondering.
cpp/src/arrow/python/type_traits.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline is redundant here: http://en.cppreference.com/w/cpp/language/inline.
A function defined entirely inside a class/struct/union definition, whether it's a member function or a non-member friend function, is implicitly an inline function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. This is really using the same convention as the rest of the file, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, so that's also called isnull. Shouldn't that mean v == Py_None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably needs a test as well since it isn't failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch :-) I'm not sure how to test it. Defining isnull is necessary for compiling, but that path isn't taken at runtime as object arrays are handled separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point we may want to have an STL-compatible view class that makes interacting with iterators constructs in the STL much easier. We have a lot of code that is manually handling iteration using a size/count and a buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which iterators are you thinking about? Do you mean the ndarray 1d iterator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's one, though I added begin()/end() for that in #1651.
73916de to
bb56637
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, I don't know what that is, but this is required to have the tests pass. Why do we always treat NaT as null but not floating-point NaN? @wesm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU There's no other way to interpret NaT other than NULL (unless there's a standard that defines it in a different way than "missing"). nan is part of the IEEE floating point specification (as I'm sure you know) and it has a different meaning than null.
|
I addressed some review comments now. |
bb56637 to
375418f
Compare
|
AppVeyor build at https://ci.appveyor.com/project/pitrou/arrow/build/1.0.157 |
375418f to
0af573b
Compare
0af573b to
939428d
Compare
|
Rebased. |
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thanks for cleaning up the int/uint size issues here, much cleaner now
|
see ARROW-2298 for adding an option about NaN conversions |
No description provided.