diff --git a/docs/builtin/pdindex.md b/docs/builtin/pdindex.md new file mode 100644 index 0000000..35db97a --- /dev/null +++ b/docs/builtin/pdindex.md @@ -0,0 +1,161 @@ +--- +jupytext: + text_representation: + format_name: myst +kernelspec: + display_name: Python 3 + name: python +--- + +# The default `PandasIndex` + +````{grid} +```{grid-item} +:columns: 3 +```{image} https://pandas.pydata.org/docs/_static/pandas.svg +--- +alt: Pandas logo +width: 200px +align: center +--- +``` +```` + +## Highlights + +1. {py:class}`xarray.indexes.PandasIndex` can wrap _one dimensional_ {py:class}`pandas.Index` objects to allow indexing along 1D coordinate variables. These indexes can apply to both {term}`"dimension" coordinates ` and {term}`"non-dimension" coordinates `. +1. When opening or constructing a new Dataset or DataArray, Xarray creates by default a {py:class}`xarray.indexes.PandasIndex` for each {term}`"dimension" coordinate `. +1. It is possible to either drop those default indexes or skip their creation. + +## Example + +Let's open a tutorial dataset. + +```{code-cell} python +import xarray as xr +``` + +```{code-cell} python +--- +tags: [remove-cell] +--- +%xmode minimal + +xr.set_options( + display_expand_indexes=True, + display_expand_attrs=False, +); +``` + +```{code-cell} python +ds_air = xr.tutorial.open_dataset("air_temperature") +ds_air +``` + +It has created by default a {py:class}`~xarray.indexes.PandasIndex` for each of +the "lat", "lon" and "time" dimension coordinates, as we can also see below via +the {py:attr}`xarray.Dataset.xindexes` property. + +```{code-cell} python +ds_air.xindexes +``` + +Those indexes are used under the hood for, e.g., label-based selection. + +```{code-cell} python +ds_air.sel(time="2013") +``` + +### Set indexes for non-dimension coordinates + +Xarray does not automatically create an index for non-dimension coordinates like +the "season (time)" coordinate added below. + +```{code-cell} python +ds_air.coords["season"] = ds_air.time.dt.season +ds_air +``` + +Without an index, it is not possible select data based on the "season" +coordinate. + +```{code-cell} python +--- +tags: [raises-exception] +--- +ds_air.sel(season="DJF") +``` + +However, it is possible to manually set a `PandasIndex` for that 1-dimensional +coordinate. + +```{code-cell} python +ds_extra = ds_air.set_xindex("season", xr.indexes.PandasIndex) +ds_extra +``` + +Which now enables label-based selection. + +```{code-cell} python +ds_extra.sel(season="DJF") +``` + +It is not yet supported to provide labels to {py:meth}`xarray.Dataset.sel` for +multiple index coordinates sharing common dimensions (unless those coordinates +also share the same index object, e.g., like shown in the {doc}`PandasMultiIndex example `). + +```{code-cell} python +--- +tags: [raises-exception] +--- +ds_extra.sel(season="DJF", time="2013") +``` + +### Drop indexes + +Indexes are not always necessary and (re-)computing them may introduce some +unwanted overhead. + +The code line below drops the default indexes that have been created when +opening the example dataset. + +```{code-cell} python +ds_air.drop_indexes(["time", "lat", "lon"]) +``` + +### Skip the creation of default indexes + +Let's re-open the example dataset above, this time with no index. + +```{code-cell} python +ds_air_no_index = xr.tutorial.open_dataset( + "air_temperature", create_default_indexes=False +) + +ds_air_no_index +``` + +Like {py:func}`xarray.open_dataset`, indexes are created by default for +dimension coordinates when constructing a new Dataset. + +```{code-cell} python +ds = xr.Dataset(coords={"x": [1, 2], "y": [3, 4, 5]}) + +ds +``` + +Also when assigning new coordinates. + +```{code-cell} python +ds.assign_coords(u=[10, 20]) +``` + +To skip the creation of those default indexes, we need to explicitly create a +new {py:class}`xarray.Coordinates` object and pass `indexes={}` (empty +dictionary). + +```{code-cell} python +coords = xr.Coordinates({"u": [10, 20]}, indexes={}) + +ds.assign_coords(coords) +``` diff --git a/docs/builtin/pdinterval.md b/docs/builtin/pdinterval.md index 2d1875b..dde578d 100644 --- a/docs/builtin/pdinterval.md +++ b/docs/builtin/pdinterval.md @@ -33,7 +33,7 @@ Learn more at the [Pandas](https://pandas.pydata.org/pandas-docs/stable/user_gui 1. Sadly {py:class}`pandas.IntervalIndex` supports numpy datetimes but not [cftime](https://unidata.github.io/cftime/). ```{important} -A pandas IntervalIndex models intervals using a single variable. The [Climate and Forecast Conventions](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#cell-boundaries), by contrast, model the intervals using two arrays: the intervals ("bounds" variable) and "central values". +A pandas IntervalIndex models intervals using a single variable. The [Climate and Forecast Conventions](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#cell-boundaries), by contrast, model the intervals using two arrays: the intervals ("bounds" variable) and "central values". ``` ## Example diff --git a/docs/builtin/pdmultiindex.md b/docs/builtin/pdmultiindex.md new file mode 100644 index 0000000..2409412 --- /dev/null +++ b/docs/builtin/pdmultiindex.md @@ -0,0 +1,134 @@ +--- +jupytext: + text_representation: + format_name: myst +kernelspec: + display_name: Python 3 + name: python +--- + +# Stack and unstack with `PandasMultiIndex` + +````{grid} +```{grid-item} +:columns: 3 +```{image} https://pandas.pydata.org/docs/_static/pandas.svg +--- +alt: Pandas logo +width: 200px +align: center +--- +``` +```` + +## Highlights + +1. An {py:class}`xarray.indexes.PandasMultiIndex` is associated with multiple coordinate variables sharing the same dimension. +1. Create PandasMultiIndex from PandasIndex using {py:meth}`xarray.Dataset.stack` and convert back with {py:meth}`xarray.Dataset.unstack`. +1. Labels of coordinates associated with a PandasMultiIndex can be passed all at once to `.sel`. + +## Example + +Let's open a tutorial dataset. + +```{code-cell} python +import xarray as xr +``` + +```{code-cell} python +--- +tags: [remove-cell] +--- +%xmode minimal + +xr.set_options( + display_expand_indexes=True, + display_expand_attrs=False, +); +``` + +```{code-cell} python +ds_air = xr.tutorial.open_dataset("air_temperature") +ds_air +``` + +### Stack / Unstack + +Stacking the "lat" and "lon" dimensions of the example dataset results here in +the corresponding "lat" and "lon" stacked coordinates both associated with a +`PandasMultiIndex` by default. +The underlying data are _reshaped_ to collapse the `lat` and `lon` dimensions to a new `space` dimension. + +```{code-cell} python +stacked = ds_air.stack(space=("lat", "lon")) +stacked +``` + +The multi-index allows retrieving the original, unstacked dataset where the +"lat" and "lon" dimension coordinates have their own `PandasIndex`. + +```{code-cell} python +unstacked = stacked.unstack("space") +unstacked +``` + +### Assigning + +We can also directly associate a {py:class}`~xarray.indexes.PandasMultiIndex` +with existing coordinates sharing the same dimension. + +```{code-cell} python +ds_air = ( + ds_air + .assign_coords(season=ds_air.time.dt.season) + .rename_vars(time="datetime") + .drop_indexes("datetime") +) + +ds_air +``` + +```{code-cell} python +multi_indexed = ds_air.set_xindex(["season", "datetime"], xr.indexes.PandasMultiIndex) +multi_indexed +``` + +### Indexing + +Contrary to what is shown in {doc}`the default PandasIndex ` example, +it is here possible to provide labels to {py:meth}`xarray.Dataset.sel` for both +of the multi-index time coordinates. + +```{code-cell} python +multi_indexed.sel(season="DJF", datetime="2013") +``` + +Chaining `.sel` calls for those coordinates each with their own index would +yield equivalent results, though. + +```{code-cell} python +single_indexed = ds_air.set_xindex("datetime").set_xindex("season") + +single_indexed.sel(season="DJF").sel(datetime="2013") +``` + +### Assigning a `pandas.MultiIndex` + +It is easy to wrap an existing {py:class}`pandas.MultiIndex` object into a new Xarray +Dataset or DataArray. + +```{code-cell} python +import pandas as pd + +midx = pd.MultiIndex.from_product([["a", "b"], [1, 2]], names=("foo", "bar")) +midx +``` + +This can be done via {py:meth}`xarray.Coordinates.from_pandas_multiindex`. + +```{code-cell} python +midx_coords = xr.Coordinates.from_pandas_multiindex(midx, dim="x") + +ds = xr.Dataset(coords=midx_coords) +ds +``` diff --git a/docs/builtin/pdrange.md b/docs/builtin/pdrange.md new file mode 100644 index 0000000..b8812cc --- /dev/null +++ b/docs/builtin/pdrange.md @@ -0,0 +1,97 @@ +--- +jupytext: + text_representation: + format_name: myst +kernelspec: + display_name: Python 3 + name: python +--- + +# Integer ranges with `pd.RangeIndex` + +````{grid} +```{grid-item} +:columns: 3 +```{image} https://pandas.pydata.org/docs/_static/pandas.svg +--- +alt: Pandas logo +width: 200px +align: center +--- +``` +```` + +## Highlights + +1. Like other pandas Index types, a {py:class}`pandas.RangeIndex` object may wrapped in an {py:class}`xarray.indexes.PandasIndex`. +1. Unlike other pandas Index types, we always want to assign a `pandas.RangeIndex` directly instead of setting it from an existing coordinate variable. +1. Xarray preserves the memory-saving `pandas.RangeIndex` structure by wrapping it in a lazy coordinate variable instead of a fully materialized array. + +## Example + +### Assigning + +```{code-cell} python +import pandas as pd +import xarray as xr +``` + +```{code-cell} python +--- +tags: [remove-cell] +--- +%xmode minimal + +xr.set_options( + display_expand_indexes=True, + display_expand_attrs=False, +); +``` + +```{code-cell} python +idx = xr.indexes.PandasIndex(pd.RangeIndex(1_000_000), dim="x") + +ds = xr.Dataset(coords=xr.Coordinates.from_xindex(idx)) +ds +``` + +### Lazy coordinate + +The `x` coordinate variable associated with the range index is lazy (i.e., all +array values are not fully materialized in memory). + +```{code-cell} python +ds.x +``` + +```{important} +`ds.x.values` will materialize all values in-memory! `x` may behave like a "coordinate variable bomb" 💣. +``` + +### Indexing + +Slicing along the `x` dimension preserves the range index -- although with a new +range -- and keeps a lazy associated coordinate variable. + +```{code-cell} python +sliced = ds.isel(x=slice(1_000, 50_000, 100)) + +sliced.x +``` + +```{code-cell} python +sliced.xindexes["x"] +``` + +Indexing with arbitrary values along the same dimension converts the underlying +pandas index type (this is all handled by pandas). + +```{code-cell} python +indexed = ds.isel(x=[10, 55, 124, 265]) + +indexed.x +``` + +```{code-cell} python +indexed.xindexes["x"] +``` diff --git a/docs/builtin/range.md b/docs/builtin/range.md index 53a9e92..a0a46e8 100644 --- a/docs/builtin/range.md +++ b/docs/builtin/range.md @@ -1 +1,81 @@ -# Large ranges with `RangeIndex` +--- +jupytext: + text_representation: + format_name: myst +kernelspec: + display_name: Python 3 + name: python +--- + +# Floating point ranges with `RangeIndex` + +## Highlights + +1. Pandas has no equivalent of {py:class}`pandas.RangeIndex` for floating point + ranges. Fortunately, there is {py:class}`xarray.indexes.RangeIndex` that + works with real numbers. +1. Xarray's `RangeIndex` is built on top of + {py:class}`xarray.indexes.CoordinateTransformIndex` and therefore supports + very large ranges represented as lazy coordinate variables. + +## Example + +### Assigning + +```{code-cell} python +import xarray as xr +``` + +```{code-cell} python +--- +tags: [remove-cell] +--- +%xmode minimal + +xr.set_options( + display_expand_indexes=True, + display_expand_attrs=False, +); +``` + +Using {py:meth}`xarray.indexes.RangeIndex.arange`. + +```{code-cell} python +idx1 = xr.indexes.RangeIndex.arange(0.0, 1000.0, 1e-3, dim="x") + +ds1 = xr.Dataset(coords=xr.Coordinates.from_xindex(idx1)) +ds1 +``` + +Using {py:meth}`xarray.indexes.RangeIndex.linspace`. + +```{code-cell} python +idx2 = xr.indexes.RangeIndex.linspace(0.0, 1000.0, 1_000_000, dim="x") + +ds2 = xr.Dataset(coords=xr.Coordinates.from_xindex(idx2)) +ds2 +``` + +### Lazy coordinate + +The `x` coordinate variable associated with the range index is lazy (i.e., all +array values are not fully materialized in memory). + +```{code-cell} python +ds1.x +``` + +```{important} +`ds.x.values` will materialize all values in-memory! `x` may behave like a "coordinate variable bomb" 💣. +``` + +### Indexing + +Slicing along the `x` dimension preserves the range index -- although with a new +range -- and keeps a lazy associated coordinate variable. + +```{code-cell} python +sliced = ds1.isel(x=slice(1_000, 50_000, 100)) + +sliced.x +``` diff --git a/docs/conf.py b/docs/conf.py index cd581e1..b109771 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -116,7 +116,7 @@ "python": ("https://docs.python.org/3/", None), "pandas": ("https://pandas.pydata.org/pandas-docs/stable", None), "numpy": ("https://numpy.org/doc/stable", None), - "xarray": ("https://docs.xarray.dev/en/stable/", None), + "xarray": ("https://docs.xarray.dev/en/latest/", None), "rasterix": ("https://rasterix.readthedocs.io/en/latest/", None), "shapely": ("https://shapely.readthedocs.io/en/latest/", None), "xvec": ("https://xvec.readthedocs.io/en/stable/", None), diff --git a/docs/earth/forecast.md b/docs/earth/forecast.md index 51504d7..871ca9f 100644 --- a/docs/earth/forecast.md +++ b/docs/earth/forecast.md @@ -28,10 +28,6 @@ A further complication is that different forecast systems have different output though most don't have _any_ missing output. ``` -```{margin} - -``` - There are many ways one might index weather forecast output. These different ways of constructing views of a forecast data are called "Forecast Model Run Collections" (FMRC), diff --git a/docs/earth/xvec.md b/docs/earth/xvec.md index 65ae84b..cd59f6d 100644 --- a/docs/earth/xvec.md +++ b/docs/earth/xvec.md @@ -72,7 +72,7 @@ Note how the `county` dimension is associated with a {py:class}`geopandas.Geomet ### Assigning -Now we can assign a {py:class}`xvec.GeometryIndex` to `county`. +Now we can assign an {py:class}`xvec.GeometryIndex` to `county`. ```{code-cell} cube = cube.xvec.set_geom_indexes("county") diff --git a/docs/index.md b/docs/index.md index 8ea9720..0b412ef 100644 --- a/docs/index.md +++ b/docs/index.md @@ -102,6 +102,9 @@ Your additions to this gallery are very welcome, particularly for fields outside caption: Built-in hidden: --- +builtin/pdindex +builtin/pdmultiindex +builtin/pdrange builtin/range builtin/pdinterval ``` diff --git a/requirements.txt b/requirements.txt index cd3b716..5688e1a 100644 --- a/requirements.txt +++ b/requirements.txt @@ -21,3 +21,4 @@ xvec git+https://github.com/dcherian/rolodex pint-xarray cf_xarray +git+https://github.com/pydata/xarray