Skip to content

Conversation

@Illviljan
Copy link
Contributor

@Illviljan Illviljan commented Dec 18, 2024

Skip the rolling tests using dask so the CI becomes usable again.

Test with terrible performance, feel free to fix it and reactivate this test:

import numpy as np
import pandas as pd

import xarray as xr


def randn(shape, frac_nan=None, chunks=None, seed=0):
    rng = np.random.default_rng(seed)
    if chunks is None:
        x = rng.standard_normal(shape)
    else:
        import dask.array as da

        rng = da.random.default_rng(seed)
        x = rng.standard_normal(shape, chunks=chunks)

    if frac_nan is not None:
        inds = rng.choice(range(x.size), int(x.size * frac_nan))
        x.flat[inds] = np.nan

    return x


nx = 3000
long_nx = 30000
ny = 200
nt = 1000
window = 20

randn_xy = randn((nx, ny), frac_nan=0.1)
randn_xt = randn((nx, nt))
randn_t = randn((nt,))
randn_long = randn((long_nx,), frac_nan=0.1)


ds = xr.Dataset(
    {
        "var1": (("x", "y"), randn_xy),
        "var2": (("x", "t"), randn_xt),
        "var3": (("t",), randn_t),
    },
    coords={
        "x": np.arange(nx),
        "y": np.linspace(0, 1, ny),
        "t": pd.date_range("1970-01-01", periods=nt, freq="D"),
        "x_coords": ("x", np.linspace(1.1, 2.1, nx)),
    },
)
window_ = 20
min_periods = 5
use_bottleneck = False
%timeit ds.rolling(x=window_, center=False, min_periods=min_periods).reduce(np.nansum).load()
# 601 ms ± 43.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


ds = ds.chunk({"x": 100, "y": 50, "t": 50})
%timeit ds.rolling(x=window_, center=False, min_periods=min_periods).reduce(np.nansum).load()
# 1min 9s ± 1.31 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

@Illviljan Illviljan added the run-benchmark Run the ASV benchmark workflow label Dec 18, 2024
@Illviljan Illviljan merged commit a90fff9 into pydata:main Dec 18, 2024
29 checks passed
dcherian added a commit to dcherian/xarray that referenced this pull request Mar 19, 2025
* main: (63 commits)
  Fix zarr upstream tests (pydata#9927)
  Update pre-commit hooks (pydata#9925)
  split out CFDatetimeCoder, deprecate use_cftime as kwarg (pydata#9901)
  dev whats-new (pydata#9923)
  Whats-new 2025.01.0 (pydata#9919)
  Silence upstream Zarr warnings (pydata#9920)
  time coding refactor (pydata#9906)
  fix warning from scipy backend guess_can_open on directory (pydata#9911)
  Enhance and move ISO-8601 parser to coding.times (pydata#9899)
  Edit serialization error message (pydata#9916)
  friendlier error messages for missing chunk managers (pydata#9676)
  Bump codecov/codecov-action from 5.1.1 to 5.1.2 in the actions group (pydata#9915)
  Rewrite interp to use `apply_ufunc` (pydata#9881)
  Skip dask rolling (pydata#9909)
  Explicitly configure ReadTheDocs build to use conf.py (pydata#9908)
  Cache pre-existing Zarr arrays in Zarr backend (pydata#9861)
  Optimize idxmin, idxmax with dask (pydata#9800)
  remove unused "type: ignore" comments in test_plot.py (fixed in matplotlib 3.10.0) (pydata#9904)
  move scalar-handling logic into `possibly_convert_objects` (pydata#9900)
  Add missing DataTree attributes to docs (pydata#9876)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-benchmark Run the ASV benchmark workflow

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant