Skip to content

Conversation

@cpcloud
Copy link
Member

@cpcloud cpcloud commented Feb 16, 2014

closes #6342

@jreback
Copy link
Contributor

jreback commented Feb 16, 2014

see u r hacking the biggest hack in internals, putmask
because of the dtype changes that are possible

@cpcloud
Copy link
Member Author

cpcloud commented Feb 16, 2014

yes this is definitely a hack. np.putmask will cycle values after a false is found if the size of the array and replacement values are not equal but it will only use the next value in the replacements if the Trues are contiguous ... highly counterintuitive behavior for that if u ask me ... is there a more optimal way that i'm overlooking?

@cpcloud
Copy link
Member Author

cpcloud commented Feb 16, 2014

internals is kind of fun .... always have to clear out my weekend when a bug involves internals.py 😄

@cpcloud
Copy link
Member Author

cpcloud commented Feb 16, 2014

fyi copyto raises in the case that the sizes are unequal, but only available in numpy >= 1.7

@jreback
Copy link
Contributor

jreback commented Feb 16, 2014

I avoided copyto for that reason

yeh...putmask has a lot of 'special' cases....which I think really are just weird things numpy does

@cpcloud
Copy link
Member Author

cpcloud commented Feb 16, 2014

replace is turning into a god function ... worth opening an issue?

@cpcloud cpcloud self-assigned this Feb 16, 2014
@jreback
Copy link
Contributor

jreback commented Feb 16, 2014

what do you mean?

@cpcloud
Copy link
Member Author

cpcloud commented Feb 16, 2014

It's just really, really large ...

@jreback
Copy link
Contributor

jreback commented Feb 16, 2014

hahah so true!

next project...you can effectively replace update with replace!

@cpcloud
Copy link
Member Author

cpcloud commented Feb 16, 2014

@jreback What do u think about this? I'll run a vbench to make sure this doesn't cause a hit.

@jreback
Copy link
Contributor

jreback commented Feb 16, 2014

sure....looks fine otherwise.....not even sure how often that path is actually hit (can you put a case that hits it up?)

@cpcloud
Copy link
Member Author

cpcloud commented Feb 16, 2014

Nested dicts hit it if they contain keys that aren't in the frame/series

df = DataFrame({'col': range(1, 5)})
df.replace({'col': {-1: 'a', 1: 'b', 2: 'c'}})

@jreback
Copy link
Contributor

jreback commented Feb 16, 2014

ok...then prob ok....not hit a lot...but do a perf check anyhow

@cpcloud
Copy link
Member Author

cpcloud commented Feb 16, 2014

Perf looks good:

packers_write_pack                           |   0.9952 |   0.9129 |   1.0901 |
stats_rank_average                           |  23.8746 |  21.2846 |   1.1217 |
timeseries_asof                              |   7.9971 |   7.0529 |   1.1339 |
join_dataframe_index_single_key_small        |   6.4060 |   5.5654 |   1.1510 |
timeseries_asof_nan                          |   8.0025 |   6.8281 |   1.1720 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

@cpcloud
Copy link
Member Author

cpcloud commented Feb 16, 2014

Random network failure unrelated to this

https://travis-ci.org/cpcloud/pandas/jobs/18993144

merging

cpcloud added a commit that referenced this pull request Feb 16, 2014
@cpcloud cpcloud merged commit 5e64e88 into pandas-dev:master Feb 16, 2014
@cpcloud cpcloud deleted the replace-nested-dict-6342 branch February 16, 2014 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DataFrame.replace() doesn't work correctly when passing a nested dict

2 participants