-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Fix for #12723: Unexpected behavior with binary operators and fill… #12791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/frame.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to do the eval first, THEN, fill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that df2.add(2, fill_value=0) should act like df2.add(2).fillna(0), instead of df2.fillna(0).add(2)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the steps are align, arith then fill (with mask).
align is moot in this case. yes you are right, fill then arith.
In [7]: Series([np.nan,1,np.nan]).add(Series([2,3],index=[2,3]),fill_value=10)
Out[7]:
0 NaN
1 11.0
2 12.0
3 13.0
dtype: float64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have modified the API, DataFrame._combine_const, is it OK?
if yes, for consistency, it seems that we need to modify _combine_const method of sparse and series and their caller .
pandas/core/frame.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just do
if fill_value is not None:
self = self.fillna(fill_value)
data = self._data.eval(......)
....
we don't directly access .values like this as you can have multiple dtypes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @jreback.
Since the _combine_const of all classes are designed without fill_value parameter, I worry whether modifying the API is a good idea. At least, Series and some member of sparse need do something to keep consistent. Could you give me some advice?
pandas/core/ops.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about use fillna here, instead of modifying the _combine_const method? @jreback
|
If it is OK, I'd like to add a whatsnew and rebase. |
pandas/tests/frame/test_missing.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specify this as np.arange(10, dtype='int64') or the comparison will fail on windows
|
pls test for the sparse cases as well |
pandas/sparse/tests/test_sparse.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with sparse dataframe. And the sparse item and NaN are quite similar and confuse for me.
Traceback (most recent call last):
File "/Users/facaiyan/Workshop/pandas/pandas/sparse/tests/test_sparse.py", line 966, in test_fill_value_missing_with_binary_operators
tm.assert_sp_frame_equal(res, exp)
File "/Users/facaiyan/Workshop/pandas/pandas/util/testing.py", line 1257, in assert_sp_frame_equal
assert_sp_series_equal(series, right[col])
File "/Users/facaiyan/Workshop/pandas/pandas/util/testing.py", line 1229, in assert_sp_series_equal
assert_sp_array_equal(left.block.values, right.block.values)
File "/Users/facaiyan/Workshop/pandas/pandas/util/testing.py", line 1205, in assert_sp_array_equal
assert_almost_equal(left.sp_values, right.sp_values)
File "/Users/facaiyan/Workshop/pandas/pandas/util/testing.py", line 124, in assert_almost_equal
return _testing.assert_almost_equal(left, right, **kwargs)
File "pandas/src/testing.pyx", line 58, in pandas._testing.assert_almost_equal (pandas/src/testing.c:3809)
File "pandas/src/testing.pyx", line 120, in pandas._testing.assert_almost_equal (pandas/src/testing.c:2101)
File "/Users/facaiyan/Workshop/pandas/pandas/util/testing.py", line 913, in raise_assert_detail
raise AssertionError(msg)
AssertionError: numpy array are different
numpy array shapes are different
[left]: (10,)
[right]: (9,)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback if I misunderstand, please tell me.
for sparse dataframe:
df.fillna(x): convert NaN entries ofdftox.- In
df.add(a, fill_value=x):fill_value, convert sparse entries ofdftox.
d2bf6c7 to
9e0835b
Compare
|
The work is done if all tests are passed. |
9a41b79 to
34b0922
Compare
…_value (GH12723) fix series test for sparse dataframe remove wrong test: fillna() is not equivalent of fill_value
|
If it is ok, |
|
issues are closed when the PRs are merged - I'll have a look soon |
|
thanks for the fix! |
git diff upstream/master | flake8 --diffcloses #12723