ARROW-2135: [Python] Fix NaN conversion when casting from Numpy array #1681

pitrou · 2018-02-28T17:44:31Z

No description provided.

pitrou · 2018-02-28T17:55:40Z

cpp/src/arrow/python/numpy-internal.h

For some reason (macro expansion?) these #ifs wouldn't work correctly here, even though NPY_INT64 is defined to NPY_LONG.

Hmm, actually, that must be because NPY_LONGLONG is not a macro...

cpcloud · 2018-02-28T20:22:26Z

cpp/src/arrow/python/numpy_to_arrow.cc

std::fill(null_bitmap_data_, null_bitmap_data_ + null_bytes, 0) is a bit more idiomatic.

Hmm, perhaps. This is really a copy/paste of NumPyConverter::InitNullBitmap()...

Possibly time for a subclass then?

cpcloud · 2018-02-28T20:24:58Z

python/pyarrow/tests/test_convert_pandas.py

Is there already a test for things like a = [1.0, 2.0, 3.1, np.nan] where a user passes in an integer type?

You mean for the truncation behavior? Let me look.

No, I don't think so. I'm not sure we specify the truncation mode anywhere either?

It looks like it's a hard cast:

In [7]: pa.array([1, 2, 3.190, np.nan], type=pa.int64()) Out[6]: <pyarrow.lib.Int64Array object at 0x7f537e42dd68> [ 1, 2, 3, NA ]

That's fine. Was just wondering.

cpcloud · 2018-02-28T20:26:58Z

cpp/src/arrow/python/type_traits.h

inline is redundant here: http://en.cppreference.com/w/cpp/language/inline.

A function defined entirely inside a class/struct/union definition, whether it's a member function or a non-member friend function, is implicitly an inline function.

I see. This is really using the same convention as the rest of the file, though.

Hm, so that's also called isnull. Shouldn't that mean v == Py_None?

Probably needs a test as well since it isn't failing.

Nice catch :-) I'm not sure how to test it. Defining isnull is necessary for compiling, but that path isn't taken at runtime as object arrays are handled separately.

cpcloud · 2018-02-28T20:28:51Z

cpp/src/arrow/python/numpy_to_arrow.cc

At some point we may want to have an STL-compatible view class that makes interacting with iterators constructs in the STL much easier. We have a lot of code that is manually handling iteration using a size/count and a buffer.

Which iterators are you thinking about? Do you mean the ndarray 1d iterator?

That's one, though I added begin()/end() for that in #1651.

pitrou · 2018-03-01T09:59:26Z

cpp/src/arrow/python/numpy_to_arrow.cc

By the way, I don't know what that is, but this is required to have the tests pass. Why do we always treat NaT as null but not floating-point NaN? @wesm

AFAIU There's no other way to interpret NaT other than NULL (unless there's a standard that defines it in a different way than "missing"). nan is part of the IEEE floating point specification (as I'm sure you know) and it has a different meaning than null.

pitrou · 2018-03-01T10:43:38Z

I addressed some review comments now.

pitrou · 2018-03-01T15:54:23Z

AppVeyor build at https://ci.appveyor.com/project/pitrou/arrow/build/1.0.157

pitrou · 2018-03-08T12:33:18Z

Rebased.

pitrou · 2018-03-08T13:19:55Z

AppVeyor at https://ci.appveyor.com/project/pitrou/arrow/build/1.0.175

wesm

+1, thanks for cleaning up the int/uint size issues here, much cleaner now

wesm · 2018-03-12T19:04:22Z

see ARROW-2298 for adding an option about NaN conversions

pitrou commented Feb 28, 2018

View reviewed changes

pitrou force-pushed the ARROW-2135-nan-conversion-when-casting branch 2 times, most recently from 6cbf133 to d602be7 Compare February 28, 2018 19:13

pitrou closed this Feb 28, 2018

pitrou reopened this Feb 28, 2018

pitrou force-pushed the ARROW-2135-nan-conversion-when-casting branch 3 times, most recently from 84766bc to cd37393 Compare February 28, 2018 20:01

cpcloud reviewed Feb 28, 2018

View reviewed changes

pitrou force-pushed the ARROW-2135-nan-conversion-when-casting branch 3 times, most recently from 73916de to bb56637 Compare March 1, 2018 09:56

pitrou commented Mar 1, 2018

View reviewed changes

pitrou force-pushed the ARROW-2135-nan-conversion-when-casting branch from bb56637 to 375418f Compare March 1, 2018 12:23

pitrou force-pushed the ARROW-2135-nan-conversion-when-casting branch from 375418f to 0af573b Compare March 5, 2018 11:41

ARROW-2135: [Python] Fix NaN conversion when casting from Numpy array

939428d

pitrou force-pushed the ARROW-2135-nan-conversion-when-casting branch from 0af573b to 939428d Compare March 8, 2018 12:33

wesm approved these changes Mar 12, 2018

View reviewed changes

wesm closed this in 171340f Mar 12, 2018

pitrou deleted the ARROW-2135-nan-conversion-when-casting branch March 12, 2018 19:04

asfimport mentioned this pull request Mar 20, 2018

[Python] NaN values silently casted to int64 when passing explicit schema for conversion in Table.from_pandas #15664

Closed

ARROW-2135: [Python] Fix NaN conversion when casting from Numpy array #1681

ARROW-2135: [Python] Fix NaN conversion when casting from Numpy array #1681

Uh oh!

Conversation

pitrou commented Feb 28, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pitrou commented Mar 1, 2018

Uh oh!

pitrou commented Mar 1, 2018

Uh oh!

pitrou commented Mar 8, 2018

Uh oh!

pitrou commented Mar 8, 2018

Uh oh!

wesm left a comment

Choose a reason for hiding this comment

Uh oh!

wesm commented Mar 12, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants