Skip to content

Comments

Golden tests#323

Merged
aaTman merged 12 commits intodevelopfrom
feat/golden-tests
Jan 26, 2026
Merged

Golden tests#323
aaTman merged 12 commits intodevelopfrom
feat/golden-tests

Conversation

@aaTman
Copy link
Collaborator

@aaTman aaTman commented Jan 26, 2026

EWB Pull Request

Description

This PR adds a golden test routine that includes:

  • All 5 current event types
  • A shapefile based region along with bounding box regions
  • An update to regions swapping lat orientiation if needed that was caught when adding the shapefile region
  • A golden test yaml for cases

Currently, the tests pass if the cases run successfully. There is still work to do in establishing more rigorous verification for these tests, such as making sure the landfall metrics are working properly (currently producing no output), getting the AR test running (area too small to detect an AR) and established outputs are made to compare new updates against.

The test do not run with the rest of the test suite to prevent Github Actions from being overwhelmed. These tests are intended to be run separately on a more powerful VM or machine.

@aaTman aaTman changed the title add ignore for golden test when running pytest by default Golden tests Jan 26, 2026
* update all references to IndividualCaseCollection and convert dicts/ "cases": keys to lists

* update template

* make questions bold

* add whitespace

* remove indent error and typo from evaluate_cli

* make load_individual_cases include passthrough for existing dataclasses

* ruff

* add comment for clarification on list comp

* ruff (again)

* remove all references to collection, replace with list

* ruff

* rename collection -> list

* ruff
* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes
* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos
* move cache dir creation to init, rename funcs, add parallel/serial check function, update test names

* update naming

* add run method for backwards compatibility

* update tests

* add tests and cover if serial and parallel_config is not None

* feat: redesign public API with hierarchical namespace submodules

- Add ewb.evaluation() as main entry point (alias for ExtremeWeatherBench)
- Create namespace submodules: ewb.targets, ewb.forecasts, ewb.metrics,
  ewb.derived, ewb.regions, ewb.cases, ewb.defaults
- Expose all classes at top level for convenience (ewb.ERA5, etc.)
- Add ewb.load_cases() convenience alias
- Update all example files to use new import pattern
- Update usage.md documentation
- Maintain backward compatibility with existing imports

* ruff/linting. add utils to init

* add test coverage for module loading patterns

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* update defaults var refs
@aaTman aaTman merged commit 34a3965 into develop Jan 26, 2026
3 checks passed
@aaTman aaTman deleted the feat/golden-tests branch January 26, 2026 20:34
aaTman added a commit that referenced this pull request Jan 26, 2026
* first pass for gt test infra + yaml

* use shapefile for severe convection and catch latitude swap

* add ignore for golden test when running pytest by default

* ruff

* move pytest addopts and markers to pyproject.toml

* Remove `IndividualCaseCollection` (#317)

* update all references to IndividualCaseCollection and convert dicts/ "cases": keys to lists

* update template

* make questions bold

* add whitespace

* remove indent error and typo from evaluate_cli

* make load_individual_cases include passthrough for existing dataclasses

* ruff

* add comment for clarification on list comp

* ruff (again)

* remove all references to collection, replace with list

* ruff

* rename collection -> list

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* Bump version from 0.2.0 to 0.3.0 (#324)

* Updated API (#321)

* move cache dir creation to init, rename funcs, add parallel/serial check function, update test names

* update naming

* add run method for backwards compatibility

* update tests

* add tests and cover if serial and parallel_config is not None

* feat: redesign public API with hierarchical namespace submodules

- Add ewb.evaluation() as main entry point (alias for ExtremeWeatherBench)
- Create namespace submodules: ewb.targets, ewb.forecasts, ewb.metrics,
  ewb.derived, ewb.regions, ewb.cases, ewb.defaults
- Expose all classes at top level for convenience (ewb.ERA5, etc.)
- Add ewb.load_cases() convenience alias
- Update all example files to use new import pattern
- Update usage.md documentation
- Maintain backward compatibility with existing imports

* ruff/linting. add utils to init

* add test coverage for module loading patterns

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* update defaults var refs

* remove to_csv
aaTman added a commit that referenced this pull request Jan 26, 2026
* update build-system and project

* update workflows, publish, and pyproject

* add justfile and twine

* update publish yaml

* change to python 3.10 as minimum requirement

* kerchunk needs 3.11, swapping pyproject and tests to remove 3.10

* change workflows to use version matrix

* align workflows

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* Bump version from 0.2.0 to 0.3.0 (#324)

* Updated API (#321)

* move cache dir creation to init, rename funcs, add parallel/serial check function, update test names

* update naming

* add run method for backwards compatibility

* update tests

* add tests and cover if serial and parallel_config is not None

* feat: redesign public API with hierarchical namespace submodules

- Add ewb.evaluation() as main entry point (alias for ExtremeWeatherBench)
- Create namespace submodules: ewb.targets, ewb.forecasts, ewb.metrics,
  ewb.derived, ewb.regions, ewb.cases, ewb.defaults
- Expose all classes at top level for convenience (ewb.ERA5, etc.)
- Add ewb.load_cases() convenience alias
- Update all example files to use new import pattern
- Update usage.md documentation
- Maintain backward compatibility with existing imports

* ruff/linting. add utils to init

* add test coverage for module loading patterns

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* update defaults var refs

* Golden tests (#323)

* first pass for gt test infra + yaml

* use shapefile for severe convection and catch latitude swap

* add ignore for golden test when running pytest by default

* ruff

* move pytest addopts and markers to pyproject.toml

* Remove `IndividualCaseCollection` (#317)

* update all references to IndividualCaseCollection and convert dicts/ "cases": keys to lists

* update template

* make questions bold

* add whitespace

* remove indent error and typo from evaluate_cli

* make load_individual_cases include passthrough for existing dataclasses

* ruff

* add comment for clarification on list comp

* ruff (again)

* remove all references to collection, replace with list

* ruff

* rename collection -> list

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* Bump version from 0.2.0 to 0.3.0 (#324)

* Updated API (#321)

* move cache dir creation to init, rename funcs, add parallel/serial check function, update test names

* update naming

* add run method for backwards compatibility

* update tests

* add tests and cover if serial and parallel_config is not None

* feat: redesign public API with hierarchical namespace submodules

- Add ewb.evaluation() as main entry point (alias for ExtremeWeatherBench)
- Create namespace submodules: ewb.targets, ewb.forecasts, ewb.metrics,
  ewb.derived, ewb.regions, ewb.cases, ewb.defaults
- Expose all classes at top level for convenience (ewb.ERA5, etc.)
- Add ewb.load_cases() convenience alias
- Update all example files to use new import pattern
- Update usage.md documentation
- Maintain backward compatibility with existing imports

* ruff/linting. add utils to init

* add test coverage for module loading patterns

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* update defaults var refs

* remove to_csv

* swap pyproject tools to hatch; add if and packages-dir to publish
aaTman added a commit that referenced this pull request Jan 27, 2026
* Add pressure_dimension_str arg to geopotential_thickness (#297)

* `DurationMeanError` memory fix and add time resolution option (#296)

* update duration with handling spatial dims, remove compute, fix sparse lead time dim generation

* update name on metric in tests

* add docstring for time res arg

* Move parallel config check outside of function (#301)

* move function out of run, move cache mkdir to init

* add tests for new func

* ruff

* update parallel_config passthrough and tests

* feat: Forecast wrapper for custom xarray datasets (#302)

* implements a new Forecast object that can wrap existing xarray datasets

* Revise per copilot review

* Simplify IBTrACS polars subset (#303)

* Update `geopotential_thickness` var names and docstring (#306)

* update docstrings and var namings

* rename vars, add test

* ruff

* Clarify default preprocess function names; geopotential division fix (#305)

* update naming

* default preprocess for applied_tc

* ruff

* ruff

* Remove "cases" key requirement in yamls and dicts (#308)

* remove cases top level of yaml and fix code to handle this

* remove old load events yaml function

* update validation precommit and formatting

* remove out-of-date notebook from docs

* CIRA Icechunk store (#310)

* dependencies and generate store file started

* in-flight, added and cleaned filter funcs

* add icechunk + obstore and cira icechunk generation script

* remove cira gen script no longer used

* code cleanup

* add icechunk datatree forecast class object

* uv lock

* add documentation, group helper func, and add repository kwargs passthrough

* remove icechunk forecast object

* typo

* ruff

* update pyproject and uv lock

* add TODO

* update PR template

* Remove `IndividualCaseCollection` (#317)

* update all references to IndividualCaseCollection and convert dicts/ "cases": keys to lists

* update template

* make questions bold

* add whitespace

* remove indent error and typo from evaluate_cli

* make load_individual_cases include passthrough for existing dataclasses

* ruff

* add comment for clarification on list comp

* ruff (again)

* remove all references to collection, replace with list

* ruff

* rename collection -> list

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* Bump version from 0.2.0 to 0.3.0 (#324)

* Updated API (#321)

* move cache dir creation to init, rename funcs, add parallel/serial check function, update test names

* update naming

* add run method for backwards compatibility

* update tests

* add tests and cover if serial and parallel_config is not None

* feat: redesign public API with hierarchical namespace submodules

- Add ewb.evaluation() as main entry point (alias for ExtremeWeatherBench)
- Create namespace submodules: ewb.targets, ewb.forecasts, ewb.metrics,
  ewb.derived, ewb.regions, ewb.cases, ewb.defaults
- Expose all classes at top level for convenience (ewb.ERA5, etc.)
- Add ewb.load_cases() convenience alias
- Update all example files to use new import pattern
- Update usage.md documentation
- Maintain backward compatibility with existing imports

* ruff/linting. add utils to init

* add test coverage for module loading patterns

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* update defaults var refs

* Golden tests (#323)

* first pass for gt test infra + yaml

* use shapefile for severe convection and catch latitude swap

* add ignore for golden test when running pytest by default

* ruff

* move pytest addopts and markers to pyproject.toml

* Remove `IndividualCaseCollection` (#317)

* update all references to IndividualCaseCollection and convert dicts/ "cases": keys to lists

* update template

* make questions bold

* add whitespace

* remove indent error and typo from evaluate_cli

* make load_individual_cases include passthrough for existing dataclasses

* ruff

* add comment for clarification on list comp

* ruff (again)

* remove all references to collection, replace with list

* ruff

* rename collection -> list

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* Bump version from 0.2.0 to 0.3.0 (#324)

* Updated API (#321)

* move cache dir creation to init, rename funcs, add parallel/serial check function, update test names

* update naming

* add run method for backwards compatibility

* update tests

* add tests and cover if serial and parallel_config is not None

* feat: redesign public API with hierarchical namespace submodules

- Add ewb.evaluation() as main entry point (alias for ExtremeWeatherBench)
- Create namespace submodules: ewb.targets, ewb.forecasts, ewb.metrics,
  ewb.derived, ewb.regions, ewb.cases, ewb.defaults
- Expose all classes at top level for convenience (ewb.ERA5, etc.)
- Add ewb.load_cases() convenience alias
- Update all example files to use new import pattern
- Update usage.md documentation
- Maintain backward compatibility with existing imports

* ruff/linting. add utils to init

* add test coverage for module loading patterns

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* update defaults var refs

* remove to_csv

* PyPI Preparation (#315)

* update build-system and project

* update workflows, publish, and pyproject

* add justfile and twine

* update publish yaml

* change to python 3.10 as minimum requirement

* kerchunk needs 3.11, swapping pyproject and tests to remove 3.10

* change workflows to use version matrix

* align workflows

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* Bump version from 0.2.0 to 0.3.0 (#324)

* Updated API (#321)

* move cache dir creation to init, rename funcs, add parallel/serial check function, update test names

* update naming

* add run method for backwards compatibility

* update tests

* add tests and cover if serial and parallel_config is not None

* feat: redesign public API with hierarchical namespace submodules

- Add ewb.evaluation() as main entry point (alias for ExtremeWeatherBench)
- Create namespace submodules: ewb.targets, ewb.forecasts, ewb.metrics,
  ewb.derived, ewb.regions, ewb.cases, ewb.defaults
- Expose all classes at top level for convenience (ewb.ERA5, etc.)
- Add ewb.load_cases() convenience alias
- Update all example files to use new import pattern
- Update usage.md documentation
- Maintain backward compatibility with existing imports

* ruff/linting. add utils to init

* add test coverage for module loading patterns

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* update defaults var refs

* Golden tests (#323)

* first pass for gt test infra + yaml

* use shapefile for severe convection and catch latitude swap

* add ignore for golden test when running pytest by default

* ruff

* move pytest addopts and markers to pyproject.toml

* Remove `IndividualCaseCollection` (#317)

* update all references to IndividualCaseCollection and convert dicts/ "cases": keys to lists

* update template

* make questions bold

* add whitespace

* remove indent error and typo from evaluate_cli

* make load_individual_cases include passthrough for existing dataclasses

* ruff

* add comment for clarification on list comp

* ruff (again)

* remove all references to collection, replace with list

* ruff

* rename collection -> list

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* Bump version from 0.2.0 to 0.3.0 (#324)

* Updated API (#321)

* move cache dir creation to init, rename funcs, add parallel/serial check function, update test names

* update naming

* add run method for backwards compatibility

* update tests

* add tests and cover if serial and parallel_config is not None

* feat: redesign public API with hierarchical namespace submodules

- Add ewb.evaluation() as main entry point (alias for ExtremeWeatherBench)
- Create namespace submodules: ewb.targets, ewb.forecasts, ewb.metrics,
  ewb.derived, ewb.regions, ewb.cases, ewb.defaults
- Expose all classes at top level for convenience (ewb.ERA5, etc.)
- Add ewb.load_cases() convenience alias
- Update all example files to use new import pattern
- Update usage.md documentation
- Maintain backward compatibility with existing imports

* ruff/linting. add utils to init

* add test coverage for module loading patterns

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* update defaults var refs

* remove to_csv

* swap pyproject tools to hatch; add if and packages-dir to publish

* update pyproject version for release

* Remove duplicate function and fixtures (#326)

* chore: remove duplicate function and fixtures

- Remove duplicate _parallel_serial_config_check function from evaluate.py
  (was defined twice at lines 189 and 982 with identical implementation)
- Remove duplicate runner fixture from test_evaluate_cli.py
  (already defined in conftest.py)
- Remove duplicate temp_config_dir fixture from test_evaluate_cli.py
  (already defined in conftest.py)
- Remove unused tempfile import from test_evaluate_cli.py

* ruff

---------

Co-authored-by: Daniel Rothenberg <daniel@danielrothenberg.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant