Skip to content

Conversation

@mroeschke
Copy link
Member

All the logic used to be held in a _Concatenator class. The statefullness wasn't really needed since concat is a one-shot method so broke it out into its functions

@mroeschke mroeschke added Refactor Internal refactoring of code Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jun 28, 2024
@mroeschke mroeschke added this to the 3.0 milestone Jun 28, 2024
@mroeschke
Copy link
Member Author

| Change   | Before [039edee0] <main>   | After [0becd77e] <ref/concat>   |   Ratio | Benchmark (Parameter)                                                                |
|----------|----------------------------|---------------------------------|---------|--------------------------------------------------------------------------------------|
| -        | 46.8±4μs                   | 41.8±0.1μs                      |    0.89 | join_merge.ConcatIndexDtype.time_concat_series('string[pyarrow]', 'has_na', 0, True) |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

@mroeschke
Copy link
Member Author

Since test are passing, going to merge. Happy to follow up if needed

@mroeschke mroeschke merged commit f3ad4d5 into pandas-dev:main Jul 5, 2024
@mroeschke mroeschke deleted the ref/concat branch July 5, 2024 17:54
@martinfleis
Copy link
Contributor

Hey, just a note that this has broken concat override in geopandas, where we're now hitting 'object' object has no attribute 'objs'. Just wanted to clarify if that is expected or just an unwanted consequence of refactoring (i.e. whether I should fix it on geopandas side).

@mroeschke
Copy link
Member Author

It was an intended consequence of the refactor given that concat isn't stateful anymore. I think I can come up with a fix that will make this non-breaking for geopandas

To confirm it appears that geopandas overrides __finalize__? Just curious the reason for that

@martinfleis
Copy link
Contributor

To confirm it appears that geopandas overrides finalize? Just curious the reason for that

To ensure that we can correctly propagate custom metadata. That used to be an information about projection and an active geometry column, right now it is only active geometry column. Without that, the resulting GeoDataFrame would lose the track of active geometry and require manual re-assignment of active geometry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Refactor Internal refactoring of code Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants