-
Notifications
You must be signed in to change notification settings - Fork 3
More builtin indexes #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
605f8db
unrelated fixes
benbovy 287cff4
add PandasIndex and PandasMultiIndex
benbovy 2f8c009
add example cross-refs
benbovy 5773d10
PandasMultiIndex stack/unstack example
benbovy 3225d3e
Merge branch 'main' into more-builtin-indexes
benbovy 7256fbc
Update docs/builtin/pdindex.md
benbovy 7a1bb8b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 86bbb06
Update docs/builtin/pdmultiindex.md
benbovy 43f0b2d
Update docs/builtin/pdmultiindex.md
benbovy 9f59c22
Update docs/builtin/pdmultiindex.md
benbovy d5269c2
Update docs/builtin/pdmultiindex.md
benbovy 0d02eeb
Update docs/builtin/pdmultiindex.md
benbovy abcd001
Update docs/builtin/pdmultiindex.md
benbovy 126d3cd
re-arrange PandasMultiIndex
benbovy 2c87389
fix
benbovy 4da3f5d
Merge branch 'main' into more-builtin-indexes
benbovy 38d8f82
change PandasMultiIndex title
benbovy 7afb715
fix PandasIndex and PandasMultiIndex cross-refs
benbovy b481922
nit
benbovy b50b805
temp: install Xarray main branch
benbovy 151e6f9
add pandas.RangeIndex and RangeIndex
benbovy 8d75775
open_dataset: skip the creation of default indexes
benbovy abd9a34
Update docs/builtin/pdrange.md
benbovy 7d4148a
Update docs/builtin/range.md
benbovy 83e436e
Revert "Update docs/builtin/pdrange.md"
benbovy File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,161 @@ | ||
| --- | ||
| jupytext: | ||
| text_representation: | ||
| format_name: myst | ||
| kernelspec: | ||
| display_name: Python 3 | ||
| name: python | ||
| --- | ||
|
|
||
| # The default `PandasIndex` | ||
|
|
||
| ````{grid} | ||
| ```{grid-item} | ||
| :columns: 3 | ||
| ```{image} https://pandas.pydata.org/docs/_static/pandas.svg | ||
| --- | ||
| alt: Pandas logo | ||
| width: 200px | ||
| align: center | ||
| --- | ||
| ``` | ||
| ```` | ||
|
|
||
| ## Highlights | ||
|
|
||
| 1. {py:class}`xarray.indexes.PandasIndex` can wrap _one dimensional_ {py:class}`pandas.Index` objects to allow indexing along 1D coordinate variables. These indexes can apply to both {term}`"dimension" coordinates <xarray:Dimension coordinate>` and {term}`"non-dimension" coordinates <xarray:Non-dimension coordinate>`. | ||
| 1. When opening or constructing a new Dataset or DataArray, Xarray creates by default a {py:class}`xarray.indexes.PandasIndex` for each {term}`"dimension" coordinate <xarray:Dimension coordinate>`. | ||
| 1. It is possible to either drop those default indexes or skip their creation. | ||
|
|
||
| ## Example | ||
|
|
||
| Let's open a tutorial dataset. | ||
|
|
||
| ```{code-cell} python | ||
| import xarray as xr | ||
| ``` | ||
|
|
||
| ```{code-cell} python | ||
| --- | ||
| tags: [remove-cell] | ||
| --- | ||
| %xmode minimal | ||
|
|
||
| xr.set_options( | ||
| display_expand_indexes=True, | ||
| display_expand_attrs=False, | ||
| ); | ||
| ``` | ||
|
|
||
| ```{code-cell} python | ||
| ds_air = xr.tutorial.open_dataset("air_temperature") | ||
| ds_air | ||
| ``` | ||
|
|
||
| It has created by default a {py:class}`~xarray.indexes.PandasIndex` for each of | ||
| the "lat", "lon" and "time" dimension coordinates, as we can also see below via | ||
| the {py:attr}`xarray.Dataset.xindexes` property. | ||
|
|
||
| ```{code-cell} python | ||
| ds_air.xindexes | ||
| ``` | ||
|
|
||
| Those indexes are used under the hood for, e.g., label-based selection. | ||
|
|
||
| ```{code-cell} python | ||
| ds_air.sel(time="2013") | ||
| ``` | ||
|
|
||
| ### Set indexes for non-dimension coordinates | ||
|
|
||
| Xarray does not automatically create an index for non-dimension coordinates like | ||
| the "season (time)" coordinate added below. | ||
|
|
||
| ```{code-cell} python | ||
| ds_air.coords["season"] = ds_air.time.dt.season | ||
| ds_air | ||
| ``` | ||
|
|
||
| Without an index, it is not possible select data based on the "season" | ||
| coordinate. | ||
|
|
||
| ```{code-cell} python | ||
| --- | ||
| tags: [raises-exception] | ||
| --- | ||
| ds_air.sel(season="DJF") | ||
| ``` | ||
|
|
||
| However, it is possible to manually set a `PandasIndex` for that 1-dimensional | ||
| coordinate. | ||
|
|
||
| ```{code-cell} python | ||
| ds_extra = ds_air.set_xindex("season", xr.indexes.PandasIndex) | ||
| ds_extra | ||
| ``` | ||
|
|
||
| Which now enables label-based selection. | ||
|
|
||
| ```{code-cell} python | ||
| ds_extra.sel(season="DJF") | ||
| ``` | ||
|
|
||
| It is not yet supported to provide labels to {py:meth}`xarray.Dataset.sel` for | ||
| multiple index coordinates sharing common dimensions (unless those coordinates | ||
| also share the same index object, e.g., like shown in the {doc}`PandasMultiIndex example <pdmultiindex>`). | ||
|
|
||
| ```{code-cell} python | ||
| --- | ||
| tags: [raises-exception] | ||
| --- | ||
| ds_extra.sel(season="DJF", time="2013") | ||
benbovy marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ``` | ||
|
|
||
| ### Drop indexes | ||
|
|
||
| Indexes are not always necessary and (re-)computing them may introduce some | ||
| unwanted overhead. | ||
|
|
||
| The code line below drops the default indexes that have been created when | ||
| opening the example dataset. | ||
|
|
||
| ```{code-cell} python | ||
| ds_air.drop_indexes(["time", "lat", "lon"]) | ||
| ``` | ||
|
|
||
| ### Skip the creation of default indexes | ||
benbovy marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Let's re-open the example dataset above, this time with no index. | ||
|
|
||
| ```{code-cell} python | ||
| ds_air_no_index = xr.tutorial.open_dataset( | ||
| "air_temperature", create_default_indexes=False | ||
| ) | ||
|
|
||
| ds_air_no_index | ||
| ``` | ||
|
|
||
| Like {py:func}`xarray.open_dataset`, indexes are created by default for | ||
| dimension coordinates when constructing a new Dataset. | ||
|
|
||
| ```{code-cell} python | ||
| ds = xr.Dataset(coords={"x": [1, 2], "y": [3, 4, 5]}) | ||
|
|
||
| ds | ||
| ``` | ||
|
|
||
| Also when assigning new coordinates. | ||
|
|
||
| ```{code-cell} python | ||
| ds.assign_coords(u=[10, 20]) | ||
| ``` | ||
|
|
||
| To skip the creation of those default indexes, we need to explicitly create a | ||
| new {py:class}`xarray.Coordinates` object and pass `indexes={}` (empty | ||
| dictionary). | ||
|
|
||
| ```{code-cell} python | ||
| coords = xr.Coordinates({"u": [10, 20]}, indexes={}) | ||
|
|
||
| ds.assign_coords(coords) | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,134 @@ | ||
| --- | ||
| jupytext: | ||
| text_representation: | ||
| format_name: myst | ||
| kernelspec: | ||
| display_name: Python 3 | ||
| name: python | ||
| --- | ||
|
|
||
| # Stack and unstack with `PandasMultiIndex` | ||
|
|
||
| ````{grid} | ||
| ```{grid-item} | ||
| :columns: 3 | ||
| ```{image} https://pandas.pydata.org/docs/_static/pandas.svg | ||
| --- | ||
| alt: Pandas logo | ||
| width: 200px | ||
| align: center | ||
| --- | ||
| ``` | ||
| ```` | ||
|
|
||
| ## Highlights | ||
|
|
||
| 1. An {py:class}`xarray.indexes.PandasMultiIndex` is associated with multiple coordinate variables sharing the same dimension. | ||
| 1. Create PandasMultiIndex from PandasIndex using {py:meth}`xarray.Dataset.stack` and convert back with {py:meth}`xarray.Dataset.unstack`. | ||
| 1. Labels of coordinates associated with a PandasMultiIndex can be passed all at once to `.sel`. | ||
|
|
||
| ## Example | ||
|
|
||
| Let's open a tutorial dataset. | ||
|
|
||
| ```{code-cell} python | ||
| import xarray as xr | ||
| ``` | ||
|
|
||
| ```{code-cell} python | ||
| --- | ||
| tags: [remove-cell] | ||
| --- | ||
| %xmode minimal | ||
|
|
||
| xr.set_options( | ||
| display_expand_indexes=True, | ||
| display_expand_attrs=False, | ||
| ); | ||
| ``` | ||
|
|
||
| ```{code-cell} python | ||
| ds_air = xr.tutorial.open_dataset("air_temperature") | ||
| ds_air | ||
| ``` | ||
|
|
||
| ### Stack / Unstack | ||
|
|
||
| Stacking the "lat" and "lon" dimensions of the example dataset results here in | ||
| the corresponding "lat" and "lon" stacked coordinates both associated with a | ||
| `PandasMultiIndex` by default. | ||
| The underlying data are _reshaped_ to collapse the `lat` and `lon` dimensions to a new `space` dimension. | ||
|
|
||
| ```{code-cell} python | ||
| stacked = ds_air.stack(space=("lat", "lon")) | ||
| stacked | ||
| ``` | ||
|
|
||
| The multi-index allows retrieving the original, unstacked dataset where the | ||
| "lat" and "lon" dimension coordinates have their own `PandasIndex`. | ||
|
|
||
| ```{code-cell} python | ||
| unstacked = stacked.unstack("space") | ||
| unstacked | ||
| ``` | ||
|
|
||
| ### Assigning | ||
|
|
||
| We can also directly associate a {py:class}`~xarray.indexes.PandasMultiIndex` | ||
| with existing coordinates sharing the same dimension. | ||
|
|
||
| ```{code-cell} python | ||
| ds_air = ( | ||
| ds_air | ||
| .assign_coords(season=ds_air.time.dt.season) | ||
| .rename_vars(time="datetime") | ||
| .drop_indexes("datetime") | ||
| ) | ||
|
|
||
| ds_air | ||
| ``` | ||
|
|
||
| ```{code-cell} python | ||
| multi_indexed = ds_air.set_xindex(["season", "datetime"], xr.indexes.PandasMultiIndex) | ||
| multi_indexed | ||
| ``` | ||
|
|
||
| ### Indexing | ||
|
|
||
| Contrary to what is shown in {doc}`the default PandasIndex <pdindex>` example, | ||
| it is here possible to provide labels to {py:meth}`xarray.Dataset.sel` for both | ||
| of the multi-index time coordinates. | ||
|
|
||
| ```{code-cell} python | ||
| multi_indexed.sel(season="DJF", datetime="2013") | ||
| ``` | ||
|
|
||
| Chaining `.sel` calls for those coordinates each with their own index would | ||
| yield equivalent results, though. | ||
|
|
||
| ```{code-cell} python | ||
| single_indexed = ds_air.set_xindex("datetime").set_xindex("season") | ||
|
|
||
| single_indexed.sel(season="DJF").sel(datetime="2013") | ||
| ``` | ||
|
|
||
| ### Assigning a `pandas.MultiIndex` | ||
|
|
||
| It is easy to wrap an existing {py:class}`pandas.MultiIndex` object into a new Xarray | ||
| Dataset or DataArray. | ||
|
|
||
| ```{code-cell} python | ||
| import pandas as pd | ||
|
|
||
| midx = pd.MultiIndex.from_product([["a", "b"], [1, 2]], names=("foo", "bar")) | ||
| midx | ||
| ``` | ||
|
|
||
| This can be done via {py:meth}`xarray.Coordinates.from_pandas_multiindex`. | ||
|
|
||
| ```{code-cell} python | ||
| midx_coords = xr.Coordinates.from_pandas_multiindex(midx, dim="x") | ||
|
|
||
| ds = xr.Dataset(coords=midx_coords) | ||
| ds | ||
| ``` |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.