-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-2145/ARROW-2157: [Python] Decimal conversion not working for NaN values #1610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Restarted the travis jobs, they seemed to be failing during |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually required? You are only using the decimal type below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be misunderstanding the use case for OwnedRefNoGIL.
My understanding is that if we're Py_XDECREFing something and we do not hold then GIL then we need to acquire it.
The GIL is released in Cython before the function that instantiates this class is called (and isn't subsequently acquired in that function). Assuming my understanding is correct then we need to acquire the GIL before calling Py_XDECREF on this.
Are you suggesting that because the order of destruction here is guaranteed to be the reverse of initialization order that we don't need to care that decimal_module_ holds the GIL because we'll never decref decimal_module_ before we decref decimal_type_?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is that you don't care about keeping a reference to the decimal module if you only need to use the decimal type:
>>> D = __import__('decimal').Decimal
>>> sys.modules['decimal'] = None
>>> D(100)
Decimal('100')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, awesome. There are a few places that we have unnecessary refs to the decimal module. I'll open a JIRA to clean those up (and remove this one). Thanks for the review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work with Python Decimal nans, right? Try decimal.Decimal('nan').
cpp/src/arrow/python/python-test.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the policy for adding tests here rather than in pyarrow/tests? It seems writing tests in pure Python is generally easier :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's funny you mention that. I agree, but I wanted to be able to step through C++ code in the CLion IDE so I wrote the test here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me add Decimal('nan') to this list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
cpp/src/arrow/python/helpers.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it's fine to make it a static global (or a singleton, etc.), as long as we don't want to support subinterpreters perhaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way the nested block above doesn't seem needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, okay I'll refactor and make those static
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm so it appears that string interning of module names doesn't play well with C++ globals. I'll leave these as they are for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, you probably can't use OwnedRef here since the destructor would trigger too late. But you could have a PyObject*. The only downside is that it would make the object eternal.
cpp/src/arrow/python/helpers.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the consts? I don't think it makes sense, and you're bound to do a lot of casts as soon as you call the Python C API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note PyObject is non-const pretty much by construction, as it has a reference count that can be mutated by any operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I was following convention in the file. I can adjust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been removing all the instances of const PyObject* after having a compiler warning from an internal Py-API, so happy to see them all go
cpp/src/arrow/python/helpers.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last argument can simply be the empty string AFAIR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool
cpp/src/arrow/python/helpers.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should call PyDecimal_Check before calling the method. Also you should check if the method raises. And it's better to use PyObject_IsTrue rather than compare against Py_True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, not sure how I missed that :)
|
@pitrou any more comments here? |
40d9b52 to
a910633
Compare
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments below. Also it seems that the AppVeyor build has failed, though it may be unrelated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This member doesn't seem used actually. Were you planning to use it with PyObject_IsInstance instead of the costlier call to internal::PyDecimal_Check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks. Will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be nice to add a comment motivating the algorithm here (why compare the absolute values but then memorize the original value)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, reconsidering this based on your comment this really should be just the max scale. Negative scale should contribute to precision only if it would increase precision. The goal here is to "cast the widest net", ie the max precision and max scale. Negative scale complicates things a tiny bit. I'll add some commentary.
cpp/src/arrow/python/common.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a C++ expert, so I'm curious why this is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constructors are not inherited by default. I'm not actually using this, so it should cost nothing at runtime. If we ever wanted to construct one of these with a pointer as it's first argument we'd have to define it anyway. I can remove it if you'd like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem with me. I had forgotten about non-inheritance of constructors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can PandasObjectIsNull return true on a Decimal instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I'll fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also check that the Arrow type was inferred correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep will do.
|
needs rebase |
|
Closed in favor of #1651 |
No description provided.