From 605f8db3082cc17878a23c9bea212c06ae312d30 Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Tue, 8 Jul 2025 15:35:21 +0200 Subject: [PATCH 01/23] unrelated fixes --- docs/earth/forecast.md | 4 ---- docs/earth/xvec.md | 2 +- 2 files changed, 1 insertion(+), 5 deletions(-) diff --git a/docs/earth/forecast.md b/docs/earth/forecast.md index 4f82801..f5946b8 100644 --- a/docs/earth/forecast.md +++ b/docs/earth/forecast.md @@ -28,10 +28,6 @@ A further complication is that different forecast systems have different output though most don't have _any_ missing output. ``` -```{margin} - -``` - There are many ways one might index weather forecast output. These different ways of constructing views of a forecast data are called "Forecast Model Run Collections" (FMRC), diff --git a/docs/earth/xvec.md b/docs/earth/xvec.md index 928c5e7..2104441 100644 --- a/docs/earth/xvec.md +++ b/docs/earth/xvec.md @@ -72,7 +72,7 @@ Note how the `county` dimension is associated with a {py:class}`geopandas.Geomet ### Assigning -Now we can assign a {py:class}`xvec.GeometryIndex` to `county`. +Now we can assign an {py:class}`xvec.GeometryIndex` to `county`. ```{code-cell} cube = cube.xvec.set_geom_indexes("county") From 287cff43fe7ede6a724a4d850d5da73dc12802ee Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Tue, 8 Jul 2025 15:35:49 +0200 Subject: [PATCH 02/23] add PandasIndex and PandasMultiIndex --- docs/builtin/pdindex.md | 151 +++++++++++++++++++++++++++++++++++ docs/builtin/pdmultiindex.md | 109 +++++++++++++++++++++++++ docs/index.md | 2 + 3 files changed, 262 insertions(+) create mode 100644 docs/builtin/pdindex.md create mode 100644 docs/builtin/pdmultiindex.md diff --git a/docs/builtin/pdindex.md b/docs/builtin/pdindex.md new file mode 100644 index 0000000..f24e133 --- /dev/null +++ b/docs/builtin/pdindex.md @@ -0,0 +1,151 @@ +--- +jupytext: + text_representation: + format_name: myst +kernelspec: + display_name: Python 3 + name: python +--- + +# The default `PandasIndex` + +````{grid} +```{grid-item} +:columns: 3 +```{image} https://pandas.pydata.org/docs/_static/pandas.svg +--- +alt: Pandas logo +width: 200px +align: center +--- +``` +```` + +## Highlights + +1. When opening or constructing a new Dataset or DataArray, Xarray creates by default a {py:class}`xarray.indexes.PandasIndex` -- i.e., a lightweight wrapper around {py:class}`pandas.Index` -- for each {term}`"dimension" coordinate `. +1. It is possible to either drop those default indexes or skip their creation. +1. It is possible to manually set a `PandasIndex` for 1-dimensional {term}`"non-dimension" coordinates ` as well. + +## Example + +Let's open a tutorial dataset. + +```{code-cell} python +import xarray as xr +``` + +```{code-cell} python +--- +tags: [remove-cell] +--- +%xmode minimal + +xr.set_options( + display_expand_indexes=True, + display_expand_attrs=False, +); +``` + +```{code-cell} python +ds_air = xr.tutorial.open_dataset("air_temperature") +ds_air +``` + +It has created by default a {py:class}`~xarray.indexes.PandasIndex` for each the +"lat", "lon" and "time" dimension coordinates, as we can also see below via the +{py:attr}`xarray.Dataset.xindexes` property. + +```{code-cell} python +ds_air.xindexes +``` + +Those indexes are used under the hood for, e.g., label-based selection. + +```{code-cell} python +ds_air.sel(time="2013") +``` + +### Set indexes for non-dimension coordinates + +Xarray does not automatically create such index for non-dimension coordinates +like the "season (time)" coordinate added below. + +```{code-cell} python +ds_air.coords["season"] = ds_air.time.dt.season +ds_air +``` + +Without an index, it is not possible select data based on the "season" +coordinate. + +```{code-cell} python +--- +tags: [raises-exception] +--- +ds_air.sel(season="DJF") +``` + +However, it is possible to manually set a `PandasIndex` for that 1-dimensional +coordinate. + +```{code-cell} python +ds_extra = ds_air.set_xindex("season", xr.indexes.PandasIndex) +ds_extra +``` + +Which now enables label-based selection. + +```{code-cell} python +ds_extra.sel(season="DJF") +``` + +It is not yet supported to provide labels to {py:meth}`xarray.Dataset.sel` for +multiple index coordinates sharing common dimensions (unless those coordinates +also share the same index object). + +```{code-cell} python +--- +tags: [raises-exception] +--- +ds_extra.sel(season="DJF", time="2013") +``` + +### Drop indexes + +Indexes are not always necessary and (re-)computing them may introduce some +unwanted overhead. + +The code line below drops the default indexes that have been created when +opening the example dataset. + +```{code-cell} python +ds_air.drop_indexes(["time", "lat", "lon"]) +``` + +### Skip the creation of default indexes + +Like {py:func}`xarray.open_dataset`, default indexes are created for dimension +coordinates when constructing a new Dataset. + +```{code-cell} python +ds = xr.Dataset(coords={"x": [1, 2], "y": [3, 4, 5]}) + +ds +``` + +Also when assigning new coordinates. + +```{code-cell} python +ds.assign_coords(u=[10, 20]) +``` + +To skip the creation of those default indexes, you need to explicitly create a +new {py:class}`xarray.Coordinates` object and pass `indexes={}` (empty +dictionary). + +```{code-cell} python +coords = xr.Coordinates({"u": [10, 20]}, indexes={}) + +ds.assign_coords(coords) +``` diff --git a/docs/builtin/pdmultiindex.md b/docs/builtin/pdmultiindex.md new file mode 100644 index 0000000..108f16b --- /dev/null +++ b/docs/builtin/pdmultiindex.md @@ -0,0 +1,109 @@ +--- +jupytext: + text_representation: + format_name: myst +kernelspec: + display_name: Python 3 + name: python +--- + +# Stick coordinates together with `PandasMultiIndex` + +````{grid} +```{grid-item} +:columns: 3 +```{image} https://pandas.pydata.org/docs/_static/pandas.svg +--- +alt: Pandas logo +width: 200px +align: center +--- +``` +```` + +## Highlights + +1. An {py:class}`xarray.indexes.PandasMultiIndex` is associated with multiple coordinates sharing the same dimension. +1. It permits using `.sel` with labels given for several of those coordinates. +1. It is the index used by default for `.stack` and `.unstack`. + +## Example + +Let's open a tutorial dataset. + +```{code-cell} python +import xarray as xr +``` + +```{code-cell} python +--- +tags: [remove-cell] +--- +%xmode minimal + +xr.set_options( + display_expand_indexes=True, + display_expand_attrs=False, +); +``` + +```{code-cell} python +ds_air = xr.tutorial.open_dataset("air_temperature") +ds_air +``` + +### Assigning + +We need multiple coordinates sharing the same dimension. + +```{code-cell} python +ds_air = ( + ds_air + .assign_coords(season=ds_air.time.dt.season) + .rename_vars(time="datetime") + .drop_indexes("datetime") +) + +ds_air +``` + +Now we can assign a {py:class}`~xarray.indexes.PandasMultiIndex` to the time +coordinates. + +```{code-cell} python +multi_indexed = ds_air.set_xindex(["season", "datetime"], xr.indexes.PandasMultiIndex) +multi_indexed +``` + +### Indexing + +Contrary to what is shown in {doc}`the default PandasIndex ` example, +it is here possible to provide labels to {py:meth}`xarray.Dataset.sel` for both +of the multi-index time coordinates. + +```{code-cell} python +multi_indexed.sel(season="DJF", datetime="2013") +``` + +### Stack / Unstack + +### Create coordinates from a `pandas.MultiIndex` + +It is easy to wrap an existing `pandas.MultiIndex` object into a new Xarray +Dataset or DataArray. + +```{code-cell} python +import pandas as pd + +midx = pd.MultiIndex.from_product([["a", "b"], [1, 2]], names=("foo", "bar")) +midx +``` + +This can be done via {py:meth}`xarray.Coordinates.from_pandas_multiindex`. + +```{code-cell} python +midx_coords = xr.Coordinates.from_pandas_multiindex(midx, dim="x") + +ds = xr.Dataset(coords=midx_coords) +ds +``` diff --git a/docs/index.md b/docs/index.md index f373c55..b9b5dbe 100644 --- a/docs/index.md +++ b/docs/index.md @@ -102,6 +102,8 @@ Your additions to this gallery are very welcome, particularly for fields outside caption: Built-in hidden: --- +builtin/pdindex +builtin/pdmultiindex builtin/range builtin/pdinterval ``` From 2f8c00944a0fd91e42a6483c67454a0de3828bd9 Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Tue, 8 Jul 2025 17:01:43 +0200 Subject: [PATCH 03/23] add example cross-refs --- docs/builtin/pdindex.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/builtin/pdindex.md b/docs/builtin/pdindex.md index f24e133..9dcff4f 100644 --- a/docs/builtin/pdindex.md +++ b/docs/builtin/pdindex.md @@ -52,9 +52,9 @@ ds_air = xr.tutorial.open_dataset("air_temperature") ds_air ``` -It has created by default a {py:class}`~xarray.indexes.PandasIndex` for each the -"lat", "lon" and "time" dimension coordinates, as we can also see below via the -{py:attr}`xarray.Dataset.xindexes` property. +It has created by default a {py:class}`~xarray.indexes.PandasIndex` for each of +the "lat", "lon" and "time" dimension coordinates, as we can also see below via +the {py:attr}`xarray.Dataset.xindexes` property. ```{code-cell} python ds_air.xindexes @@ -68,8 +68,8 @@ ds_air.sel(time="2013") ### Set indexes for non-dimension coordinates -Xarray does not automatically create such index for non-dimension coordinates -like the "season (time)" coordinate added below. +Xarray does not automatically create an index for non-dimension coordinates like +the "season (time)" coordinate added below. ```{code-cell} python ds_air.coords["season"] = ds_air.time.dt.season @@ -102,7 +102,7 @@ ds_extra.sel(season="DJF") It is not yet supported to provide labels to {py:meth}`xarray.Dataset.sel` for multiple index coordinates sharing common dimensions (unless those coordinates -also share the same index object). +also share the same index object, e.g., like shown in the {doc}`PandasMultiIndex example `). ```{code-cell} python --- From 5773d10cbb45e1d9c415944c677bed75b0ae7261 Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Tue, 8 Jul 2025 17:08:38 +0200 Subject: [PATCH 04/23] PandasMultiIndex stack/unstack example --- docs/builtin/pdmultiindex.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/docs/builtin/pdmultiindex.md b/docs/builtin/pdmultiindex.md index 108f16b..fc47a83 100644 --- a/docs/builtin/pdmultiindex.md +++ b/docs/builtin/pdmultiindex.md @@ -87,6 +87,23 @@ multi_indexed.sel(season="DJF", datetime="2013") ### Stack / Unstack +Stacking the "lat" and "lon" dimensions of the example dataset results here in +the corresponding "lat" and "lon" stacked coordinates both associated with a +`PandasMultiIndex` by default. + +```{code-cell} python +stacked = multi_indexed.stack(space=("lat", "lon")) +stacked +``` + +The multi-index allows retrieving the original, unstacked dataset where the +"lat" and "lon" dimension coordinates have their own `PandasIndex`. + +```{code-cell} python +unstacked = stacked.unstack("space") +unstacked +``` + ### Create coordinates from a `pandas.MultiIndex` It is easy to wrap an existing `pandas.MultiIndex` object into a new Xarray From 7256fbca6202bada70d147829dfe31f400a13c03 Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Tue, 8 Jul 2025 20:18:14 +0200 Subject: [PATCH 05/23] Update docs/builtin/pdindex.md Co-authored-by: Deepak Cherian --- docs/builtin/pdindex.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/builtin/pdindex.md b/docs/builtin/pdindex.md index 9dcff4f..9211472 100644 --- a/docs/builtin/pdindex.md +++ b/docs/builtin/pdindex.md @@ -23,9 +23,9 @@ align: center ## Highlights -1. When opening or constructing a new Dataset or DataArray, Xarray creates by default a {py:class}`xarray.indexes.PandasIndex` -- i.e., a lightweight wrapper around {py:class}`pandas.Index` -- for each {term}`"dimension" coordinate `. +1. {py:class}`xarray.indexes.PandasIndex` can wrap _one dimensional_ {py:class}`pandas.Index` objects to allow indexing along 1D coordinate variables. These indexes can apply to both {term}`"dimension" coordinates "` and {term}`"non-dimension" coordinates `. +1. When opening or constructing a new Dataset or DataArray, Xarray creates by default a {py:class}`xarray.indexes.PandasIndex` for each {term}`"dimension" coordinate `. 1. It is possible to either drop those default indexes or skip their creation. -1. It is possible to manually set a `PandasIndex` for 1-dimensional {term}`"non-dimension" coordinates ` as well. ## Example From 7a1bb8bd2aae695af63d0c54261520b470ab574f Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Tue, 8 Jul 2025 18:18:25 +0000 Subject: [PATCH 06/23] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- docs/builtin/pdindex.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/builtin/pdindex.md b/docs/builtin/pdindex.md index 9211472..9f0b30d 100644 --- a/docs/builtin/pdindex.md +++ b/docs/builtin/pdindex.md @@ -23,7 +23,7 @@ align: center ## Highlights -1. {py:class}`xarray.indexes.PandasIndex` can wrap _one dimensional_ {py:class}`pandas.Index` objects to allow indexing along 1D coordinate variables. These indexes can apply to both {term}`"dimension" coordinates "` and {term}`"non-dimension" coordinates `. +1. {py:class}`xarray.indexes.PandasIndex` can wrap _one dimensional_ {py:class}`pandas.Index` objects to allow indexing along 1D coordinate variables. These indexes can apply to both {term}`"dimension" coordinates "` and {term}`"non-dimension" coordinates `. 1. When opening or constructing a new Dataset or DataArray, Xarray creates by default a {py:class}`xarray.indexes.PandasIndex` for each {term}`"dimension" coordinate `. 1. It is possible to either drop those default indexes or skip their creation. From 86bbb066eb421cf3bfc140570a567fc802bba021 Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Tue, 8 Jul 2025 20:18:45 +0200 Subject: [PATCH 07/23] Update docs/builtin/pdmultiindex.md Co-authored-by: Deepak Cherian --- docs/builtin/pdmultiindex.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/builtin/pdmultiindex.md b/docs/builtin/pdmultiindex.md index fc47a83..f7c779f 100644 --- a/docs/builtin/pdmultiindex.md +++ b/docs/builtin/pdmultiindex.md @@ -23,7 +23,7 @@ align: center ## Highlights -1. An {py:class}`xarray.indexes.PandasMultiIndex` is associated with multiple coordinates sharing the same dimension. +1. An {py:class}`xarray.indexes.PandasMultiIndex` is associated with multiple coordinate variables sharing the same dimension. 1. It permits using `.sel` with labels given for several of those coordinates. 1. It is the index used by default for `.stack` and `.unstack`. From 43f0b2de8d4c8f18073d7a52d5ef61cc4d25ee1f Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Tue, 8 Jul 2025 20:19:09 +0200 Subject: [PATCH 08/23] Update docs/builtin/pdmultiindex.md Co-authored-by: Deepak Cherian --- docs/builtin/pdmultiindex.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/builtin/pdmultiindex.md b/docs/builtin/pdmultiindex.md index f7c779f..e230413 100644 --- a/docs/builtin/pdmultiindex.md +++ b/docs/builtin/pdmultiindex.md @@ -25,7 +25,7 @@ align: center 1. An {py:class}`xarray.indexes.PandasMultiIndex` is associated with multiple coordinate variables sharing the same dimension. 1. It permits using `.sel` with labels given for several of those coordinates. -1. It is the index used by default for `.stack` and `.unstack`. +1. Create MultiIndexes from PandasIndex using {py:class}`Dataset.stack` and convert back with {py:class}`Dataset.unstack`. ## Example From 9f59c22bd932a8828700859f49742ac0fe2cb253 Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Tue, 8 Jul 2025 20:19:29 +0200 Subject: [PATCH 09/23] Update docs/builtin/pdmultiindex.md Co-authored-by: Deepak Cherian --- docs/builtin/pdmultiindex.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/builtin/pdmultiindex.md b/docs/builtin/pdmultiindex.md index e230413..572b52e 100644 --- a/docs/builtin/pdmultiindex.md +++ b/docs/builtin/pdmultiindex.md @@ -106,7 +106,7 @@ unstacked ### Create coordinates from a `pandas.MultiIndex` -It is easy to wrap an existing `pandas.MultiIndex` object into a new Xarray +It is easy to wrap an existing {py:class}`pandas.MultiIndex` object into a new Xarray Dataset or DataArray. ```{code-cell} python From d5269c2239326bbf7d64f036d924dba3e01fcfe2 Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Tue, 8 Jul 2025 20:20:46 +0200 Subject: [PATCH 10/23] Update docs/builtin/pdmultiindex.md Co-authored-by: Deepak Cherian --- docs/builtin/pdmultiindex.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/builtin/pdmultiindex.md b/docs/builtin/pdmultiindex.md index 572b52e..ca8817b 100644 --- a/docs/builtin/pdmultiindex.md +++ b/docs/builtin/pdmultiindex.md @@ -104,7 +104,7 @@ unstacked = stacked.unstack("space") unstacked ``` -### Create coordinates from a `pandas.MultiIndex` +### Assigning a `pandas.MultiIndex` It is easy to wrap an existing {py:class}`pandas.MultiIndex` object into a new Xarray Dataset or DataArray. From 0d02eebf6518643a4b1600213814aedd36a25653 Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Tue, 8 Jul 2025 20:20:54 +0200 Subject: [PATCH 11/23] Update docs/builtin/pdmultiindex.md Co-authored-by: Deepak Cherian --- docs/builtin/pdmultiindex.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/builtin/pdmultiindex.md b/docs/builtin/pdmultiindex.md index ca8817b..cd97056 100644 --- a/docs/builtin/pdmultiindex.md +++ b/docs/builtin/pdmultiindex.md @@ -92,7 +92,7 @@ the corresponding "lat" and "lon" stacked coordinates both associated with a `PandasMultiIndex` by default. ```{code-cell} python -stacked = multi_indexed.stack(space=("lat", "lon")) +stacked = ds_air.stack(space=("lat", "lon")) stacked ``` From abcd00170dfac1410073f91b59a729b4ec209ec3 Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Tue, 8 Jul 2025 20:21:23 +0200 Subject: [PATCH 12/23] Update docs/builtin/pdmultiindex.md Co-authored-by: Deepak Cherian --- docs/builtin/pdmultiindex.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/builtin/pdmultiindex.md b/docs/builtin/pdmultiindex.md index cd97056..0020f52 100644 --- a/docs/builtin/pdmultiindex.md +++ b/docs/builtin/pdmultiindex.md @@ -90,6 +90,7 @@ multi_indexed.sel(season="DJF", datetime="2013") Stacking the "lat" and "lon" dimensions of the example dataset results here in the corresponding "lat" and "lon" stacked coordinates both associated with a `PandasMultiIndex` by default. +The underlying data are _reshaped_ to collapse the `lat` and `lon` dimensions to a new `space` dimension. ```{code-cell} python stacked = ds_air.stack(space=("lat", "lon")) From 126d3cdb4f6b396c2c02883c20356d1bcb256c93 Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Tue, 8 Jul 2025 21:08:24 +0200 Subject: [PATCH 13/23] re-arrange PandasMultiIndex --- docs/builtin/pdmultiindex.md | 49 ++++++++++++++++++++---------------- 1 file changed, 28 insertions(+), 21 deletions(-) diff --git a/docs/builtin/pdmultiindex.md b/docs/builtin/pdmultiindex.md index 0020f52..493e4de 100644 --- a/docs/builtin/pdmultiindex.md +++ b/docs/builtin/pdmultiindex.md @@ -24,8 +24,8 @@ align: center ## Highlights 1. An {py:class}`xarray.indexes.PandasMultiIndex` is associated with multiple coordinate variables sharing the same dimension. -1. It permits using `.sel` with labels given for several of those coordinates. -1. Create MultiIndexes from PandasIndex using {py:class}`Dataset.stack` and convert back with {py:class}`Dataset.unstack`. +1. Create PandasMultiIndex from PandasIndex using {py:meth}`xarray.Dataset.stack` and convert back with {py:meth}`xarray.Dataset.unstack`. +1. Labels of coordinates associated with a PandasMultiIndex can be passed all at once to `.sel`. ## Example @@ -52,9 +52,30 @@ ds_air = xr.tutorial.open_dataset("air_temperature") ds_air ``` +### Stack / Unstack + +Stacking the "lat" and "lon" dimensions of the example dataset results here in +the corresponding "lat" and "lon" stacked coordinates both associated with a +`PandasMultiIndex` by default. +The underlying data are _reshaped_ to collapse the `lat` and `lon` dimensions to a new `space` dimension. + +```{code-cell} python +stacked = ds_air.stack(space=("lat", "lon")) +stacked +``` + +The multi-index allows retrieving the original, unstacked dataset where the +"lat" and "lon" dimension coordinates have their own `PandasIndex`. + +```{code-cell} python +unstacked = stacked.unstack("space") +unstacked +``` + ### Assigning -We need multiple coordinates sharing the same dimension. +We can also directly associate a {py:class}`~xarray.indexes.PandasMultiIndex` +with existing coordinates sharing the same dimension. ```{code-cell} python ds_air = ( @@ -67,9 +88,6 @@ ds_air = ( ds_air ``` -Now we can assign a {py:class}`~xarray.indexes.PandasMultiIndex` to the time -coordinates. - ```{code-cell} python multi_indexed = ds_air.set_xindex(["season", "datetime"], xr.indexes.PandasMultiIndex) multi_indexed @@ -85,24 +103,13 @@ of the multi-index time coordinates. multi_indexed.sel(season="DJF", datetime="2013") ``` -### Stack / Unstack - -Stacking the "lat" and "lon" dimensions of the example dataset results here in -the corresponding "lat" and "lon" stacked coordinates both associated with a -`PandasMultiIndex` by default. -The underlying data are _reshaped_ to collapse the `lat` and `lon` dimensions to a new `space` dimension. +Chaining `.sel` calls for those coordinates each with their own index would +yield equivalent results, though. ```{code-cell} python -stacked = ds_air.stack(space=("lat", "lon")) -stacked -``` +single_indexed = ds_air.set_xindex("datetime").set_xindex("season") -The multi-index allows retrieving the original, unstacked dataset where the -"lat" and "lon" dimension coordinates have their own `PandasIndex`. - -```{code-cell} python -unstacked = stacked.unstack("space") -unstacked +single_indexed.sel(season="DJF").sel(datetime="2013") ``` ### Assigning a `pandas.MultiIndex` From 2c873895f5ce63294340adc2071ac44ca7413da7 Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Tue, 8 Jul 2025 21:08:42 +0200 Subject: [PATCH 14/23] fix --- docs/builtin/pdindex.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/builtin/pdindex.md b/docs/builtin/pdindex.md index 9f0b30d..6fd8798 100644 --- a/docs/builtin/pdindex.md +++ b/docs/builtin/pdindex.md @@ -23,7 +23,7 @@ align: center ## Highlights -1. {py:class}`xarray.indexes.PandasIndex` can wrap _one dimensional_ {py:class}`pandas.Index` objects to allow indexing along 1D coordinate variables. These indexes can apply to both {term}`"dimension" coordinates "` and {term}`"non-dimension" coordinates `. +1. {py:class}`xarray.indexes.PandasIndex` can wrap _one dimensional_ {py:class}`pandas.Index` objects to allow indexing along 1D coordinate variables. These indexes can apply to both {term}`"dimension" coordinates ` and {term}`"non-dimension" coordinates `. 1. When opening or constructing a new Dataset or DataArray, Xarray creates by default a {py:class}`xarray.indexes.PandasIndex` for each {term}`"dimension" coordinate `. 1. It is possible to either drop those default indexes or skip their creation. From 38d8f82b4afea4853dda5ad3a4794ea9fa05ddea Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Tue, 8 Jul 2025 21:12:33 +0200 Subject: [PATCH 15/23] change PandasMultiIndex title --- docs/builtin/pdmultiindex.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/builtin/pdmultiindex.md b/docs/builtin/pdmultiindex.md index 493e4de..2409412 100644 --- a/docs/builtin/pdmultiindex.md +++ b/docs/builtin/pdmultiindex.md @@ -7,7 +7,7 @@ kernelspec: name: python --- -# Stick coordinates together with `PandasMultiIndex` +# Stack and unstack with `PandasMultiIndex` ````{grid} ```{grid-item} From 7afb71539f1e2eacc0e1eecfa8f6e67e0b91019a Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Tue, 8 Jul 2025 21:18:05 +0200 Subject: [PATCH 16/23] fix PandasIndex and PandasMultiIndex cross-refs --- docs/conf.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/conf.py b/docs/conf.py index cd581e1..b109771 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -116,7 +116,7 @@ "python": ("https://docs.python.org/3/", None), "pandas": ("https://pandas.pydata.org/pandas-docs/stable", None), "numpy": ("https://numpy.org/doc/stable", None), - "xarray": ("https://docs.xarray.dev/en/stable/", None), + "xarray": ("https://docs.xarray.dev/en/latest/", None), "rasterix": ("https://rasterix.readthedocs.io/en/latest/", None), "shapely": ("https://shapely.readthedocs.io/en/latest/", None), "xvec": ("https://xvec.readthedocs.io/en/stable/", None), From b481922dae03bc9a8c28cd312494bde7380017eb Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Tue, 8 Jul 2025 22:35:36 +0200 Subject: [PATCH 17/23] nit --- docs/builtin/pdinterval.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/builtin/pdinterval.md b/docs/builtin/pdinterval.md index 2d1875b..dde578d 100644 --- a/docs/builtin/pdinterval.md +++ b/docs/builtin/pdinterval.md @@ -33,7 +33,7 @@ Learn more at the [Pandas](https://pandas.pydata.org/pandas-docs/stable/user_gui 1. Sadly {py:class}`pandas.IntervalIndex` supports numpy datetimes but not [cftime](https://unidata.github.io/cftime/). ```{important} -A pandas IntervalIndex models intervals using a single variable. The [Climate and Forecast Conventions](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#cell-boundaries), by contrast, model the intervals using two arrays: the intervals ("bounds" variable) and "central values". +A pandas IntervalIndex models intervals using a single variable. The [Climate and Forecast Conventions](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#cell-boundaries), by contrast, model the intervals using two arrays: the intervals ("bounds" variable) and "central values". ``` ## Example From b50b805181bf4a640a34c30336d33fc2c136eba7 Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Tue, 8 Jul 2025 22:36:08 +0200 Subject: [PATCH 18/23] temp: install Xarray main branch This is needed for pd.RangeIndex with "lazy" coordinate variable. This will be needed for NDPointIndex too. TODO: remove when next version of Xarray is released. --- requirements.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/requirements.txt b/requirements.txt index cd3b716..5688e1a 100644 --- a/requirements.txt +++ b/requirements.txt @@ -21,3 +21,4 @@ xvec git+https://github.com/dcherian/rolodex pint-xarray cf_xarray +git+https://github.com/pydata/xarray From 151e6f9877e727b6a19b65f779e6ecd205c58f4e Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Tue, 8 Jul 2025 22:37:54 +0200 Subject: [PATCH 19/23] add pandas.RangeIndex and RangeIndex --- docs/builtin/pdrange.md | 97 +++++++++++++++++++++++++++++++++++++++++ docs/builtin/range.md | 82 +++++++++++++++++++++++++++++++++- docs/index.md | 1 + 3 files changed, 179 insertions(+), 1 deletion(-) create mode 100644 docs/builtin/pdrange.md diff --git a/docs/builtin/pdrange.md b/docs/builtin/pdrange.md new file mode 100644 index 0000000..b8812cc --- /dev/null +++ b/docs/builtin/pdrange.md @@ -0,0 +1,97 @@ +--- +jupytext: + text_representation: + format_name: myst +kernelspec: + display_name: Python 3 + name: python +--- + +# Integer ranges with `pd.RangeIndex` + +````{grid} +```{grid-item} +:columns: 3 +```{image} https://pandas.pydata.org/docs/_static/pandas.svg +--- +alt: Pandas logo +width: 200px +align: center +--- +``` +```` + +## Highlights + +1. Like other pandas Index types, a {py:class}`pandas.RangeIndex` object may wrapped in an {py:class}`xarray.indexes.PandasIndex`. +1. Unlike other pandas Index types, we always want to assign a `pandas.RangeIndex` directly instead of setting it from an existing coordinate variable. +1. Xarray preserves the memory-saving `pandas.RangeIndex` structure by wrapping it in a lazy coordinate variable instead of a fully materialized array. + +## Example + +### Assigning + +```{code-cell} python +import pandas as pd +import xarray as xr +``` + +```{code-cell} python +--- +tags: [remove-cell] +--- +%xmode minimal + +xr.set_options( + display_expand_indexes=True, + display_expand_attrs=False, +); +``` + +```{code-cell} python +idx = xr.indexes.PandasIndex(pd.RangeIndex(1_000_000), dim="x") + +ds = xr.Dataset(coords=xr.Coordinates.from_xindex(idx)) +ds +``` + +### Lazy coordinate + +The `x` coordinate variable associated with the range index is lazy (i.e., all +array values are not fully materialized in memory). + +```{code-cell} python +ds.x +``` + +```{important} +`ds.x.values` will materialize all values in-memory! `x` may behave like a "coordinate variable bomb" 💣. +``` + +### Indexing + +Slicing along the `x` dimension preserves the range index -- although with a new +range -- and keeps a lazy associated coordinate variable. + +```{code-cell} python +sliced = ds.isel(x=slice(1_000, 50_000, 100)) + +sliced.x +``` + +```{code-cell} python +sliced.xindexes["x"] +``` + +Indexing with arbitrary values along the same dimension converts the underlying +pandas index type (this is all handled by pandas). + +```{code-cell} python +indexed = ds.isel(x=[10, 55, 124, 265]) + +indexed.x +``` + +```{code-cell} python +indexed.xindexes["x"] +``` diff --git a/docs/builtin/range.md b/docs/builtin/range.md index 53a9e92..539fd17 100644 --- a/docs/builtin/range.md +++ b/docs/builtin/range.md @@ -1 +1,81 @@ -# Large ranges with `RangeIndex` +--- +jupytext: + text_representation: + format_name: myst +kernelspec: + display_name: Python 3 + name: python +--- + +# Floating point ranges with `RangeIndex` + +## Highlights + +1. Pandas has no equivalent of {py:class}`pandas.RangeIndex` for floating point + ranges... Fortunately, there is {py:class}`xarray.indexes.RangeIndex` that + works with real numbers. +1. Xarray's `RangeIndex` is built on top of + {py:class}`xarray.indexes.CoordinateTransformIndex` and therefore supports + very large ranges represented as lazy coordinate variables. + +## Example + +### Assigning + +```{code-cell} python +import xarray as xr +``` + +```{code-cell} python +--- +tags: [remove-cell] +--- +%xmode minimal + +xr.set_options( + display_expand_indexes=True, + display_expand_attrs=False, +); +``` + +Using {py:meth}`xarray.indexes.RangeIndex.arange`. + +```{code-cell} python +idx1 = xr.indexes.RangeIndex.arange(0.0, 1000.0, 1e-3, dim="x") + +ds1 = xr.Dataset(coords=xr.Coordinates.from_xindex(idx1)) +ds1 +``` + +Using {py:meth}`xarray.indexes.RangeIndex.linspace`. + +```{code-cell} python +idx2 = xr.indexes.RangeIndex.linspace(0.0, 1000.0, 1_000_000, dim="x") + +ds2 = xr.Dataset(coords=xr.Coordinates.from_xindex(idx2)) +ds2 +``` + +### Lazy coordinate + +The `x` coordinate variable associated with the range index is lazy (i.e., all +array values are not fully materialized in memory). + +```{code-cell} python +ds1.x +``` + +```{important} +`ds.x.values` will materialize all values in-memory! `x` may behave like a "coordinate variable bomb" 💣. +``` + +### Indexing + +Slicing along the `x` dimension preserves the range index -- although with a new +range -- and keeps a lazy associated coordinate variable. + +```{code-cell} python +sliced = ds1.isel(x=slice(1_000, 50_000, 100)) + +sliced.x +``` diff --git a/docs/index.md b/docs/index.md index 32319a0..0b412ef 100644 --- a/docs/index.md +++ b/docs/index.md @@ -104,6 +104,7 @@ hidden: --- builtin/pdindex builtin/pdmultiindex +builtin/pdrange builtin/range builtin/pdinterval ``` From 8d75775c9bc73c9537c12a61419c384b9496161c Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Tue, 8 Jul 2025 22:56:11 +0200 Subject: [PATCH 20/23] open_dataset: skip the creation of default indexes --- docs/builtin/pdindex.md | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/docs/builtin/pdindex.md b/docs/builtin/pdindex.md index 6fd8798..35db97a 100644 --- a/docs/builtin/pdindex.md +++ b/docs/builtin/pdindex.md @@ -125,8 +125,18 @@ ds_air.drop_indexes(["time", "lat", "lon"]) ### Skip the creation of default indexes -Like {py:func}`xarray.open_dataset`, default indexes are created for dimension -coordinates when constructing a new Dataset. +Let's re-open the example dataset above, this time with no index. + +```{code-cell} python +ds_air_no_index = xr.tutorial.open_dataset( + "air_temperature", create_default_indexes=False +) + +ds_air_no_index +``` + +Like {py:func}`xarray.open_dataset`, indexes are created by default for +dimension coordinates when constructing a new Dataset. ```{code-cell} python ds = xr.Dataset(coords={"x": [1, 2], "y": [3, 4, 5]}) @@ -140,7 +150,7 @@ Also when assigning new coordinates. ds.assign_coords(u=[10, 20]) ``` -To skip the creation of those default indexes, you need to explicitly create a +To skip the creation of those default indexes, we need to explicitly create a new {py:class}`xarray.Coordinates` object and pass `indexes={}` (empty dictionary). From abd9a34b2964d3a711c06f28d9a80648f7e84516 Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Wed, 9 Jul 2025 09:35:41 +0200 Subject: [PATCH 21/23] Update docs/builtin/pdrange.md Co-authored-by: Deepak Cherian --- docs/builtin/pdrange.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/builtin/pdrange.md b/docs/builtin/pdrange.md index b8812cc..676ecf4 100644 --- a/docs/builtin/pdrange.md +++ b/docs/builtin/pdrange.md @@ -49,7 +49,7 @@ xr.set_options( ``` ```{code-cell} python -idx = xr.indexes.PandasIndex(pd.RangeIndex(1_000_000), dim="x") +idx = xr.indexes.PandasIndex(pd.RangeIndex(1_000_000_000), dim="x") ds = xr.Dataset(coords=xr.Coordinates.from_xindex(idx)) ds From 7d4148a93e4d3b1780fbabc00848ad052132488b Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Wed, 9 Jul 2025 09:35:50 +0200 Subject: [PATCH 22/23] Update docs/builtin/range.md Co-authored-by: Deepak Cherian --- docs/builtin/range.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/builtin/range.md b/docs/builtin/range.md index 539fd17..a0a46e8 100644 --- a/docs/builtin/range.md +++ b/docs/builtin/range.md @@ -12,7 +12,7 @@ kernelspec: ## Highlights 1. Pandas has no equivalent of {py:class}`pandas.RangeIndex` for floating point - ranges... Fortunately, there is {py:class}`xarray.indexes.RangeIndex` that + ranges. Fortunately, there is {py:class}`xarray.indexes.RangeIndex` that works with real numbers. 1. Xarray's `RangeIndex` is built on top of {py:class}`xarray.indexes.CoordinateTransformIndex` and therefore supports From 83e436e4397535111e328997fb5f463a7ddde9be Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Wed, 9 Jul 2025 09:38:39 +0200 Subject: [PATCH 23/23] Revert "Update docs/builtin/pdrange.md" This reverts commit abd9a34b2964d3a711c06f28d9a80648f7e84516. --- docs/builtin/pdrange.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/builtin/pdrange.md b/docs/builtin/pdrange.md index 676ecf4..b8812cc 100644 --- a/docs/builtin/pdrange.md +++ b/docs/builtin/pdrange.md @@ -49,7 +49,7 @@ xr.set_options( ``` ```{code-cell} python -idx = xr.indexes.PandasIndex(pd.RangeIndex(1_000_000_000), dim="x") +idx = xr.indexes.PandasIndex(pd.RangeIndex(1_000_000), dim="x") ds = xr.Dataset(coords=xr.Coordinates.from_xindex(idx)) ds