Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
605f8db
unrelated fixes
benbovy Jul 8, 2025
287cff4
add PandasIndex and PandasMultiIndex
benbovy Jul 8, 2025
2f8c009
add example cross-refs
benbovy Jul 8, 2025
5773d10
PandasMultiIndex stack/unstack example
benbovy Jul 8, 2025
3225d3e
Merge branch 'main' into more-builtin-indexes
benbovy Jul 8, 2025
7256fbc
Update docs/builtin/pdindex.md
benbovy Jul 8, 2025
7a1bb8b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 8, 2025
86bbb06
Update docs/builtin/pdmultiindex.md
benbovy Jul 8, 2025
43f0b2d
Update docs/builtin/pdmultiindex.md
benbovy Jul 8, 2025
9f59c22
Update docs/builtin/pdmultiindex.md
benbovy Jul 8, 2025
d5269c2
Update docs/builtin/pdmultiindex.md
benbovy Jul 8, 2025
0d02eeb
Update docs/builtin/pdmultiindex.md
benbovy Jul 8, 2025
abcd001
Update docs/builtin/pdmultiindex.md
benbovy Jul 8, 2025
126d3cd
re-arrange PandasMultiIndex
benbovy Jul 8, 2025
2c87389
fix
benbovy Jul 8, 2025
4da3f5d
Merge branch 'main' into more-builtin-indexes
benbovy Jul 8, 2025
38d8f82
change PandasMultiIndex title
benbovy Jul 8, 2025
7afb715
fix PandasIndex and PandasMultiIndex cross-refs
benbovy Jul 8, 2025
b481922
nit
benbovy Jul 8, 2025
b50b805
temp: install Xarray main branch
benbovy Jul 8, 2025
151e6f9
add pandas.RangeIndex and RangeIndex
benbovy Jul 8, 2025
8d75775
open_dataset: skip the creation of default indexes
benbovy Jul 8, 2025
abd9a34
Update docs/builtin/pdrange.md
benbovy Jul 9, 2025
7d4148a
Update docs/builtin/range.md
benbovy Jul 9, 2025
83e436e
Revert "Update docs/builtin/pdrange.md"
benbovy Jul 9, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
161 changes: 161 additions & 0 deletions docs/builtin/pdindex.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
---
jupytext:
text_representation:
format_name: myst
kernelspec:
display_name: Python 3
name: python
---

# The default `PandasIndex`

````{grid}
```{grid-item}
:columns: 3
```{image} https://pandas.pydata.org/docs/_static/pandas.svg
---
alt: Pandas logo
width: 200px
align: center
---
```
````

## Highlights

1. {py:class}`xarray.indexes.PandasIndex` can wrap _one dimensional_ {py:class}`pandas.Index` objects to allow indexing along 1D coordinate variables. These indexes can apply to both {term}`"dimension" coordinates <xarray:Dimension coordinate>` and {term}`"non-dimension" coordinates <xarray:Non-dimension coordinate>`.
1. When opening or constructing a new Dataset or DataArray, Xarray creates by default a {py:class}`xarray.indexes.PandasIndex` for each {term}`"dimension" coordinate <xarray:Dimension coordinate>`.
1. It is possible to either drop those default indexes or skip their creation.

## Example

Let's open a tutorial dataset.

```{code-cell} python
import xarray as xr
```

```{code-cell} python
---
tags: [remove-cell]
---
%xmode minimal

xr.set_options(
display_expand_indexes=True,
display_expand_attrs=False,
);
```

```{code-cell} python
ds_air = xr.tutorial.open_dataset("air_temperature")
ds_air
```

It has created by default a {py:class}`~xarray.indexes.PandasIndex` for each of
the "lat", "lon" and "time" dimension coordinates, as we can also see below via
the {py:attr}`xarray.Dataset.xindexes` property.

```{code-cell} python
ds_air.xindexes
```

Those indexes are used under the hood for, e.g., label-based selection.

```{code-cell} python
ds_air.sel(time="2013")
```

### Set indexes for non-dimension coordinates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Set indexes for non-dimension coordinates
### Set indexes for any coordinate variable


Xarray does not automatically create an index for non-dimension coordinates like
the "season (time)" coordinate added below.

```{code-cell} python
ds_air.coords["season"] = ds_air.time.dt.season
ds_air
```

Without an index, it is not possible select data based on the "season"
coordinate.

```{code-cell} python
---
tags: [raises-exception]
---
ds_air.sel(season="DJF")
```

However, it is possible to manually set a `PandasIndex` for that 1-dimensional
coordinate.

```{code-cell} python
ds_extra = ds_air.set_xindex("season", xr.indexes.PandasIndex)
ds_extra
```

Which now enables label-based selection.

```{code-cell} python
ds_extra.sel(season="DJF")
```

It is not yet supported to provide labels to {py:meth}`xarray.Dataset.sel` for
multiple index coordinates sharing common dimensions (unless those coordinates
also share the same index object, e.g., like shown in the {doc}`PandasMultiIndex example <pdmultiindex>`).

```{code-cell} python
---
tags: [raises-exception]
---
ds_extra.sel(season="DJF", time="2013")
```

### Drop indexes

Indexes are not always necessary and (re-)computing them may introduce some
unwanted overhead.

The code line below drops the default indexes that have been created when
opening the example dataset.

```{code-cell} python
ds_air.drop_indexes(["time", "lat", "lon"])
```

### Skip the creation of default indexes

Let's re-open the example dataset above, this time with no index.

```{code-cell} python
ds_air_no_index = xr.tutorial.open_dataset(
"air_temperature", create_default_indexes=False
)

ds_air_no_index
```

Like {py:func}`xarray.open_dataset`, indexes are created by default for
dimension coordinates when constructing a new Dataset.

```{code-cell} python
ds = xr.Dataset(coords={"x": [1, 2], "y": [3, 4, 5]})

ds
```

Also when assigning new coordinates.

```{code-cell} python
ds.assign_coords(u=[10, 20])
```

To skip the creation of those default indexes, we need to explicitly create a
new {py:class}`xarray.Coordinates` object and pass `indexes={}` (empty
dictionary).

```{code-cell} python
coords = xr.Coordinates({"u": [10, 20]}, indexes={})

ds.assign_coords(coords)
```
2 changes: 1 addition & 1 deletion docs/builtin/pdinterval.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Learn more at the [Pandas](https://pandas.pydata.org/pandas-docs/stable/user_gui
1. Sadly {py:class}`pandas.IntervalIndex` supports numpy datetimes but not [cftime](https://unidata.github.io/cftime/).

```{important}
A pandas IntervalIndex models intervals using a single variable. The [Climate and Forecast Conventions](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#cell-boundaries), by contrast, model the intervals using two arrays: the intervals ("bounds" variable) and "central values".
A pandas IntervalIndex models intervals using a single variable. The [Climate and Forecast Conventions](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#cell-boundaries), by contrast, model the intervals using two arrays: the intervals ("bounds" variable) and "central values".
```

## Example
Expand Down
134 changes: 134 additions & 0 deletions docs/builtin/pdmultiindex.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
---
jupytext:
text_representation:
format_name: myst
kernelspec:
display_name: Python 3
name: python
---

# Stack and unstack with `PandasMultiIndex`

````{grid}
```{grid-item}
:columns: 3
```{image} https://pandas.pydata.org/docs/_static/pandas.svg
---
alt: Pandas logo
width: 200px
align: center
---
```
````

## Highlights

1. An {py:class}`xarray.indexes.PandasMultiIndex` is associated with multiple coordinate variables sharing the same dimension.
1. Create PandasMultiIndex from PandasIndex using {py:meth}`xarray.Dataset.stack` and convert back with {py:meth}`xarray.Dataset.unstack`.
1. Labels of coordinates associated with a PandasMultiIndex can be passed all at once to `.sel`.

## Example

Let's open a tutorial dataset.

```{code-cell} python
import xarray as xr
```

```{code-cell} python
---
tags: [remove-cell]
---
%xmode minimal

xr.set_options(
display_expand_indexes=True,
display_expand_attrs=False,
);
```

```{code-cell} python
ds_air = xr.tutorial.open_dataset("air_temperature")
ds_air
```

### Stack / Unstack

Stacking the "lat" and "lon" dimensions of the example dataset results here in
the corresponding "lat" and "lon" stacked coordinates both associated with a
`PandasMultiIndex` by default.
The underlying data are _reshaped_ to collapse the `lat` and `lon` dimensions to a new `space` dimension.

```{code-cell} python
stacked = ds_air.stack(space=("lat", "lon"))
stacked
```

The multi-index allows retrieving the original, unstacked dataset where the
"lat" and "lon" dimension coordinates have their own `PandasIndex`.

```{code-cell} python
unstacked = stacked.unstack("space")
unstacked
```

### Assigning

We can also directly associate a {py:class}`~xarray.indexes.PandasMultiIndex`
with existing coordinates sharing the same dimension.

```{code-cell} python
ds_air = (
ds_air
.assign_coords(season=ds_air.time.dt.season)
.rename_vars(time="datetime")
.drop_indexes("datetime")
)

ds_air
```

```{code-cell} python
multi_indexed = ds_air.set_xindex(["season", "datetime"], xr.indexes.PandasMultiIndex)
multi_indexed
```

### Indexing

Contrary to what is shown in {doc}`the default PandasIndex <pdindex>` example,
it is here possible to provide labels to {py:meth}`xarray.Dataset.sel` for both
of the multi-index time coordinates.

```{code-cell} python
multi_indexed.sel(season="DJF", datetime="2013")
```

Chaining `.sel` calls for those coordinates each with their own index would
yield equivalent results, though.

```{code-cell} python
single_indexed = ds_air.set_xindex("datetime").set_xindex("season")

single_indexed.sel(season="DJF").sel(datetime="2013")
```

### Assigning a `pandas.MultiIndex`

It is easy to wrap an existing {py:class}`pandas.MultiIndex` object into a new Xarray
Dataset or DataArray.

```{code-cell} python
import pandas as pd

midx = pd.MultiIndex.from_product([["a", "b"], [1, 2]], names=("foo", "bar"))
midx
```

This can be done via {py:meth}`xarray.Coordinates.from_pandas_multiindex`.

```{code-cell} python
midx_coords = xr.Coordinates.from_pandas_multiindex(midx, dim="x")

ds = xr.Dataset(coords=midx_coords)
ds
```
Loading