Skip to content

Conversation

@jbrockmendel
Copy link
Member

Mostly CLN with a couple of PERF things tacked on at the last minute

  1. replace can_hold_element checks with np_can_hold_element checks and only for non-dtlike numpy dtypes. This avoids double-validating in non-raising cases.

  2. Avoid up/downcasting in Block.where no-op cases

import numpy as np
import pandas as pd

df = pd.DataFrame(range(10**6), dtype=np.int32)
mask = np.ones(df.shape, dtype=bool)

%timeit res = df.where(mask, -1)
7.69 ms ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <- master
1.31 ms ± 104 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <- PR

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance labels Jan 8, 2022
@jreback jreback added this to the 1.4 milestone Jan 8, 2022
@jreback jreback merged commit e69e97f into pandas-dev:master Jan 8, 2022
@jreback
Copy link
Contributor

jreback commented Jan 8, 2022

@meeseeksdev backport 1.4.x

@lumberbot-app
Copy link

lumberbot-app bot commented Jan 8, 2022

Something went wrong ... Please have a look at my logs.

@jbrockmendel jbrockmendel deleted the fixmes30 branch January 8, 2022 16:22
jreback pushed a commit that referenced this pull request Jan 8, 2022
Co-authored-by: jbrockmendel <jbrockmendel@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants