-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-2145/ARROW-2153/ARROW-2157/ARROW-2160/ARROW-2177: [Python] Decimal conversion not working for NaN values #1651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
31 commits
Select commit
Hold shift + click to select a range
8e816ec
ARROW-2145: [Python] Decimal conversion not working for NaN values
cpcloud f562378
IWYU
cpcloud 8893a45
Revert header change
cpcloud 0665f6e
Revert test change
cpcloud e6ac864
Install libboost-regex-dev on travis
cpcloud 50e35d6
Use shared boost on parquet CI build
cpcloud 8be22a6
Install boost with c++11 option
cpcloud 7c7270a
Show boost install
cpcloud 77a41ee
Install boost first
cpcloud 4c74c63
NULLPTR to nullptr
cpcloud d905202
DCHECK_OK
cpcloud 281f798
DCHECK_OK
cpcloud 1df6923
DCHECK_OK
cpcloud db664f2
DCHECK_Ok
cpcloud 092a962
Fix order of operands
cpcloud 418754f
Check return value of PyList_SetItem
cpcloud b24ff25
Add DecimalMetadata::Update test for ignoring NaN values
cpcloud 3190b1a
Ignore nans in decimal metadata update
cpcloud a05b316
Refactor import decimal and acquire the gil before importing
cpcloud 4e6db3c
Formatting
cpcloud 29e1ebc
boost osx debugging
cpcloud b4bcfd9
DCHECK_OK for release builds
cpcloud 78cbf51
More script debugging
cpcloud 03ee999
Fix boost root
cpcloud ae5db5f
Perms
cpcloud 99505a9
Silence cmake complaints about boost version
cpcloud 00be578
Add tests to accommodate decimal values
cpcloud ab3e4a5
Brewfile
cpcloud 0d45688
Pass version as argument
cpcloud 1fc2a96
Args must be a ruby Hash
cpcloud 97fcb96
Make sure we only install if glibc is affected
cpcloud File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| # Licensed to the Apache Software Foundation (ASF) under one | ||
| # or more contributor license agreements. See the NOTICE file | ||
| # distributed with this work for additional information | ||
| # regarding copyright ownership. The ASF licenses this file | ||
| # to you under the Apache License, Version 2.0 (the | ||
| # "License"); you may not use this file except in compliance | ||
| # with the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, | ||
| # software distributed under the License is distributed on an | ||
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| # KIND, either express or implied. See the License for the | ||
| # specific language governing permissions and limitations | ||
| # under the License. | ||
|
|
||
| if [ "$ARROW_CI_C_GLIB_AFFECTED" = "1" ]; then | ||
| brew update | ||
| brew bundle --file=$TRAVIS_BUILD_DIR/c_glib/Brewfile | ||
| fi |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -76,7 +76,15 @@ class ScalarVisitor { | |
| timestamp_count_(0), | ||
| float_count_(0), | ||
| binary_count_(0), | ||
| unicode_count_(0) {} | ||
| unicode_count_(0), | ||
| decimal_count_(0), | ||
| max_decimal_metadata_(std::numeric_limits<int32_t>::min(), | ||
| std::numeric_limits<int32_t>::min()), | ||
| decimal_type_() { | ||
| PyAcquireGIL lock; | ||
| Status status = internal::ImportDecimalType(&decimal_type_); | ||
| DCHECK_OK(status); | ||
| } | ||
|
|
||
| Status Visit(PyObject* obj) { | ||
| ++total_count_; | ||
|
|
@@ -111,10 +119,13 @@ class ScalarVisitor { | |
| ss << type->ToString(); | ||
| return Status::Invalid(ss.str()); | ||
| } | ||
| } else if (PyObject_IsInstance(obj, decimal_type_.obj())) { | ||
| RETURN_NOT_OK(max_decimal_metadata_.Update(obj)); | ||
| ++decimal_count_; | ||
| } else { | ||
| // TODO(wesm): accumulate error information somewhere | ||
| static std::string supported_types = | ||
| "bool, float, integer, date, datetime, bytes, unicode"; | ||
| "bool, float, integer, date, datetime, bytes, unicode, decimal"; | ||
| std::stringstream ss; | ||
| ss << "Error inferring Arrow data type for collection of Python objects. "; | ||
| RETURN_NOT_OK(InvalidConversion(obj, supported_types, &ss)); | ||
|
|
@@ -125,7 +136,9 @@ class ScalarVisitor { | |
|
|
||
| std::shared_ptr<DataType> GetType() { | ||
| // TODO(wesm): handling mixed-type cases | ||
| if (float_count_) { | ||
| if (decimal_count_) { | ||
| return decimal(max_decimal_metadata_.precision(), max_decimal_metadata_.scale()); | ||
| } else if (float_count_) { | ||
| return float64(); | ||
| } else if (int_count_) { | ||
| // TODO(wesm): tighter type later | ||
|
|
@@ -157,8 +170,13 @@ class ScalarVisitor { | |
| int64_t float_count_; | ||
| int64_t binary_count_; | ||
| int64_t unicode_count_; | ||
| int64_t decimal_count_; | ||
|
|
||
| internal::DecimalMetadata max_decimal_metadata_; | ||
|
|
||
| // Place to accumulate errors | ||
| // std::vector<Status> errors_; | ||
| OwnedRefNoGIL decimal_type_; | ||
| }; | ||
|
|
||
| static constexpr int MAX_NESTING_LEVELS = 32; | ||
|
|
@@ -379,17 +397,14 @@ class TypedConverter : public SeqConverter { | |
| BuilderType* typed_builder_; | ||
| }; | ||
|
|
||
| // We use the CRTP trick here to devirtualize the AppendItem() and AppendNull() | ||
| // We use the CRTP trick here to devirtualize the AppendItem(), AppendNull(), and IsNull() | ||
| // method calls. | ||
| template <typename BuilderType, class Derived> | ||
| class TypedConverterVisitor : public TypedConverter<BuilderType> { | ||
| public: | ||
| Status AppendSingle(PyObject* obj) override { | ||
| if (obj == Py_None) { | ||
| return static_cast<Derived*>(this)->AppendNull(); | ||
| } else { | ||
| return static_cast<Derived*>(this)->AppendItem(obj); | ||
| } | ||
| auto self = static_cast<Derived*>(this); | ||
| return self->IsNull(obj) ? self->AppendNull() : self->AppendItem(obj); | ||
| } | ||
|
|
||
| Status AppendMultiple(PyObject* obj, int64_t size) override { | ||
|
|
@@ -409,6 +424,7 @@ class TypedConverterVisitor : public TypedConverter<BuilderType> { | |
|
|
||
| // Append a missing item (default implementation) | ||
| Status AppendNull() { return this->typed_builder_->AppendNull(); } | ||
| bool IsNull(PyObject* obj) const { return obj == Py_None; } | ||
| }; | ||
|
|
||
| class NullConverter : public TypedConverterVisitor<NullBuilder, NullConverter> { | ||
|
|
@@ -830,12 +846,16 @@ class DecimalConverter | |
| public: | ||
| // Append a non-missing item | ||
| Status AppendItem(PyObject* obj) { | ||
| /// TODO(phillipc): Check for nan? | ||
| Decimal128 value; | ||
| const auto& type = static_cast<const DecimalType&>(*typed_builder_->type()); | ||
| RETURN_NOT_OK(internal::DecimalFromPythonDecimal(obj, type, &value)); | ||
| return typed_builder_->Append(value); | ||
| } | ||
|
|
||
| bool IsNull(PyObject* obj) const { | ||
| return obj == Py_None || obj == numpy_nan || internal::PyFloat_isnan(obj) || | ||
| (internal::PyDecimal_Check(obj) && internal::PyDecimal_ISNAN(obj)); | ||
|
||
| } | ||
| }; | ||
|
|
||
| // Dynamic constructor for sequence converters | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the rationale for this, the symbol linking issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import pyarrow.parquetwas segfaulting, I assumed because we're statically linking boost in the parquet build and dynamically in the arrow build. This only shows up when using the regex library.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, we should be consistent about which we do across the libraries. Part of why I wish we were building all these libraries in a monorepo setting