Skip to content

Conversation

@jbrockmendel
Copy link
Member

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Touched on in #32779, this makes all* DataFrame indexing-on-columns do slicing instead of taking, so doesn't make copies.

* Actually there is a different path in _slice_take_blocks_ax0 that we go down if we self._is_single_block, can update that later if we decide this is something we want to do.

In [3]: dti = pd.date_range("2016-01-01", periods=10**5, freq="S")                                                                                                                                                  
In [4]: df = pd.DataFrame._from_arrays([dti]*10 + [dti - dti] * 10 + [dti.to_period("D")]*10, columns=range(30), index=range(len(dti)))                                                                             

In [8]: arr = np.arange(30)                                                                                                                                                                                         
In [9]: np.random.shuffle(arr)                                                                                                                                                                                      

In [10]: %timeit df[arr]
8.35 ms ± 64.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)    # <-- master                                                                                                                                                     
650 µs ± 52.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)   # <-- PR

The tradeoff is that we end up with less-consolidated results. I'm OK with that, but there may be downsides I'm not aware of, will wait for others to weigh in.

@jorisvandenbossche
Copy link
Member

@jbrockmendel can you maybe open an issue for this? As your questions seems to be more broad than the actual PR, and to not mix that discussion with code review of the actual PR

@jbrockmendel
Copy link
Member Author

i cant reproduce the test failures locally

@jbrockmendel
Copy link
Member Author

Opened #33780, mothballing this.

@jbrockmendel jbrockmendel reopened this Jul 24, 2021
@jbrockmendel jbrockmendel added Indexing Related to indexing on series/frames, not to indexes themselves Copy / view semantics and removed Mothballed Temporarily-closed PR the author plans to return to labels Jul 24, 2021
@jreback
Copy link
Contributor

jreback commented Nov 28, 2021

this is quite old, happen to reopen if actively worked on.

@jreback jreback closed this Nov 28, 2021
@jbrockmendel jbrockmendel added the Mothballed Temporarily-closed PR the author plans to return to label Dec 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Copy / view semantics Indexing Related to indexing on series/frames, not to indexes themselves Internals Related to non-user accessible pandas implementation Mothballed Temporarily-closed PR the author plans to return to Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants