Skip to content

Conversation

@dsaxton
Copy link
Contributor

@dsaxton dsaxton commented Jun 20, 2020

Bug noticed when looking at a different issue #34871 (comment)

@gfyoung gfyoung added Bug DataFrame DataFrame data structure Series Series data structure labels Jun 20, 2020
if inplace:
return [self]
return [self.copy()]
return self.astype(object).replace(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only want to do this if these are different types, not generally. please show asv's for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the if not self._can_hold_element(to_replace) check is making sure we have different types.

Not seeing any performance regressions from the change (at least in the replace benchmarks):

(pandas-dev) > asv continuous -f 1.1 upstream/master replace-diff-dtype -b ^replace
· Creating environments
· Discovering benchmarks
·· Uninstalling from conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt.
·· Installing 1e014777 <replace-diff-dtype> into conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
· Running 12 total benchmarks (2 commits * 1 environments * 6 benchmarks)
[  0.00%] · For pandas commit 9114b4bc <master> (round 1/2):
[  0.00%] ·· Building for conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[  0.00%] ·· Benchmarking conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[  4.17%] ··· Running (replace.Convert.time_replace--)....
[ 20.83%] ··· Running (replace.ReplaceList.time_replace_list--)..
[ 25.00%] · For pandas commit 1e014777 <replace-diff-dtype> (round 1/2):
[ 25.00%] ·· Building for conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[ 25.00%] ·· Benchmarking conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 29.17%] ··· Running (replace.Convert.time_replace--)....
[ 45.83%] ··· Running (replace.ReplaceList.time_replace_list--)..
[ 50.00%] · For pandas commit 1e014777 <replace-diff-dtype> (round 2/2):
[ 50.00%] ·· Benchmarking conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 54.17%] ··· replace.Convert.time_replace                                                                                                                                                                 ok
[ 54.17%] ··· ============= ============ ============
              --                   replace_data      
              ------------- -------------------------
               constructor   Timestamp    Timedelta  
              ============= ============ ============
                DataFrame     98.6±3ms    95.2±0.5ms 
                  Series     74.0±0.3ms    74.6±1ms  
              ============= ============ ============

[ 58.33%] ··· replace.FillNa.time_fillna                                                                                                                                                                   ok
[ 58.33%] ··· ========= =============
               inplace               
              --------- -------------
                 True    1.04±0.01ms 
                False     3.01±0.3ms 
              ========= =============

[ 62.50%] ··· replace.FillNa.time_replace                                                                                                                                                                  ok
[ 62.50%] ··· ========= =============
               inplace               
              --------- -------------
                 True    1.10±0.02ms 
                False    4.02±0.07ms 
              ========= =============

[ 66.67%] ··· replace.ReplaceDict.time_replace_series                                                                                                                                                      ok
[ 66.67%] ··· ========= ============
               inplace              
              --------- ------------
                 True    4.27±0.02s 
                False    4.24±0.04s 
              ========= ============

[ 70.83%] ··· replace.ReplaceList.time_replace_list                                                                                                                                                        ok
[ 70.83%] ··· ========= ============
               inplace              
              --------- ------------
                 True    51.6±0.6μs 
                False     405±5ms   
              ========= ============

[ 75.00%] ··· replace.ReplaceList.time_replace_list_one_match                                                                                                                                              ok
[ 75.00%] ··· ========= ==========
               inplace            
              --------- ----------
                 True    89.2±3ms 
                False    467±4ms  
              ========= ==========

[ 75.00%] · For pandas commit 9114b4bc <master> (round 2/2):
[ 75.00%] ·· Building for conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[ 75.00%] ·· Benchmarking conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 79.17%] ··· replace.Convert.time_replace                                                                                                                                                                 ok
[ 79.17%] ··· ============= =========== ===========
              --                  replace_data     
              ------------- -----------------------
               constructor   Timestamp   Timedelta 
              ============= =========== ===========
                DataFrame     102±4ms     103±3ms  
                  Series      80.0±4ms    83.1±4ms 
              ============= =========== ===========

[ 83.33%] ··· replace.FillNa.time_fillna                                                                                                                                                                   ok
[ 83.33%] ··· ========= =============
               inplace               
              --------- -------------
                 True    1.10±0.04ms 
                False     3.04±0.1ms 
              ========= =============

[ 87.50%] ··· replace.FillNa.time_replace                                                                                                                                                                  ok
[ 87.50%] ··· ========= =============
               inplace               
              --------- -------------
                 True    1.20±0.04ms 
                False     4.26±0.2ms 
              ========= =============

[ 91.67%] ··· replace.ReplaceDict.time_replace_series                                                                                                                                                      ok
[ 91.67%] ··· ========= ============
               inplace              
              --------- ------------
                 True    4.26±0.01s 
                False    4.24±0.01s 
              ========= ============

[ 95.83%] ··· replace.ReplaceList.time_replace_list                                                                                                                                                        ok
[ 95.83%] ··· ========= ==========
               inplace            
              --------- ----------
                 True    49.6±1μs 
                False    412±6ms  
              ========= ==========

[100.00%] ··· replace.ReplaceList.time_replace_list_one_match                                                                                                                                              ok
[100.00%] ··· ========= ==========
               inplace            
              --------- ----------
                 True    89.3±2ms 
                False    464±30ms 
              ========= ==========


BENCHMARKS NOT SIGNIFICANTLY CHANGED.

@jreback
Copy link
Contributor

jreback commented Jul 17, 2020

can u merge master and will look

@simonjayhawkins
Copy link
Member

@dsaxton can you move release note

@simonjayhawkins
Copy link
Member

marking as stale 😉


**Reshaping**

- Bug where :meth:`DataFrame.replace` and :meth:`Series.replace` would fail when replacement value was of a different dtype but equal to values in the original object (:issue:`34871`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move to 1.2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks similar to regression reported in #35376 with open PR (with different fix) #36444

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it's the same issue. If another fix avoids astyping to object I would think it's preferred.

@jreback jreback added this to the 1.2 milestone Sep 19, 2020
@jreback jreback removed the Stale label Sep 19, 2020
@dsaxton
Copy link
Contributor Author

dsaxton commented Sep 19, 2020

Closing since the other PR looks like a better fix

@dsaxton dsaxton closed this Sep 19, 2020
@dsaxton dsaxton deleted the replace-diff-dtype branch September 19, 2020 14:04
@QuentinN42 QuentinN42 mentioned this pull request Sep 21, 2020
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug DataFrame DataFrame data structure Series Series data structure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants