Skip to content

Add from_polytope_reforecast() for hdate support across all encoders#108

Merged
awarde96 merged 23 commits intomainfrom
feature/hdate_reanalysis_data
Apr 16, 2026
Merged

Add from_polytope_reforecast() for hdate support across all encoders#108
awarde96 merged 23 commits intomainfrom
feature/hdate_reanalysis_data

Conversation

@awarde96
Copy link
Copy Markdown
Collaborator

@awarde96 awarde96 commented Mar 17, 2026

Description

Adds from_polytope_reforecast() to the base Encoder class. Under the hood, from_polytope() gets a date_key parameter (defaults to "date" so nothing changes for existing callers). from_polytope_reforecast() just calls from_polytope(result, date_key="hdate").

This means all 10 encoders now support reforecast data where hdate is the time axis instead of date. Each hdate value produces its own coverage — no merging across hdates.

Scope decisions

  • Reforecast only, not reanalysis: We originally considered merging multiple hdate values into a single coverage with an expanded t-axis (for reanalysis use cases like CEMS-floods). That's descoped and each hdate is a separate coverage, which is the correct behavior for reforecasts. The more complex merging can be revisited later if needed.
  • "Forecast date" metadata: When using hdate, the hdate value ends up as mars_metadata["Forecast date"]. For reforecasts this probably makes sense (hdate is the forecast init time). For pure reanalysis this naming is arguably wrong but we're not targeting that use case here.
  • Decoders not updated: The existing decoders should still work since the output is standard CoverageJSON, just with different metadata values. But there are no decoder-specific hdate tests. Decoder validation should be a follow-up.

Tests

The test_encoder_*_from_polytope.py files are regression tests, not realistic e2e examples. The tree fixtures exercise the encoder code paths and verify the CoverageJSON structure but use small/simplified inputs (e.g. 2 spatial points for a bounding box, multiple steps for Position). They should be sufficient for catching regressions introduced in this PR but more realistic integration tests with actual polytope output would be a good follow-up.

Also fixed

Circle.py had return self.covjsonå (non-ASCII typo) instead of return self.covjson in from_xarray. Pre-existing on main, fixed here since we were touching the file.

Contributor Declaration

By opening this pull request, I affirm the following:

  • All authors agree to the Contributor License Agreement.
  • The code follows the project's coding standards.
  • I have performed self-review and added comments where needed.
  • I have added or updated tests to verify that my changes are effective and functional.
  • I have run all existing tests and confirmed they pass.

@andreas-grafberger andreas-grafberger force-pushed the feature/hdate_reanalysis_data branch from 43a8c27 to d25fb01 Compare April 8, 2026 13:20
Comment thread tests/test_encoder_time_series_from_polytope.py
Comment thread tests/test_encoder_time_series_from_polytope.py Outdated
Since support for our hdate-based reanalysis data does seem to be a rare
(and soon outdated) use case, this changes the tests and implementation
to only focus on reforecast data in general. Users for old efas
reanalysis data now just will not profit from automatic merging of hdate
+ time + step axis into a single dimension, as this could easily break
reforecast retrieval patterns.

Each hdate is an independent forecast initialisation, so one coverage
per (point × hdate) is the correct reforecast semantic. Fix test
expectations to match (coverage counts, per-coverage Forecast date),
rename helpers for clarity, and add test_multiple_hdates_and_steps.
- Add from_polytope_reforecast to Encoder ABC
- Remove inline hdate hacks from from_polytope in all 8 encoders that had them
- Implement from_polytope_reforecast on all 9 non-TimeSeries encoders by
delegating to from_polytope(result, date_key="hdate")
- Add from_polytope regression tests for all 10 encoder types
- Add from_polytope_reforecast tests for all 10 encoder types
- Extract shared test helpers into tests/conftest.py
@andreas-grafberger andreas-grafberger force-pushed the feature/hdate_reanalysis_data branch from 18c2f56 to eaaa078 Compare April 13, 2026 08:45
@andreas-grafberger andreas-grafberger changed the title If hdate in tree use hdate instead of date Add from_polytope_reforecast() for hdate support across all encoders Apr 14, 2026
@andreas-grafberger andreas-grafberger marked this pull request as ready for review April 14, 2026 16:37
@awarde96
Copy link
Copy Markdown
Collaborator Author

@copilot

Copy link
Copy Markdown

Copilot AI commented Apr 16, 2026

@copilot

It looks like your comment may have been sent without the full message. Could you clarify what you'd like me to do?

@awarde96
Copy link
Copy Markdown
Collaborator Author

@copilot please review this PR

Copy link
Copy Markdown

Copilot AI commented Apr 16, 2026

@copilot please review this PR

Here's my review of the PR:

Overall: the approach is clean and all 159 tests pass.


What the PR does well

  • Backwards-compatible by designdate_key defaults to "date" everywhere so no existing caller is affected.
  • Single change pointwalk_tree gets the date_key parameter and propagates it recursively; all 10 subclasses just forward it through.
  • from_polytope_reforecast in the base class — inheriting it for free in every encoder is the right pattern. No code duplication.
  • Typo fix (covjsonå) is good to have bundled in.
  • Test coverage — regression tests across all 10 encoders for both normal forecasts and reforecast paths, including multi-hdate/multi-step cases.

Issues / things to be aware of

  1. walk_tree_step and walk_tree_month still hardcode "date".
    from_polytope_reforecast only calls from_polytope which routes through walk_tree, so this doesn't cause a regression right now. But if someone later adds a from_polytope_step_reforecast or from_polytope_month_reforecast, they'll hit the same problem. Worth a follow-up note/issue.

  2. mars_metadata["Forecast date"] is always written, even when date_key="hdate".
    The PR description acknowledges this. For reforecasts it's reasonable (hdate is the forecast init date). For reanalysis it's misleading. Since pure reanalysis is out of scope for this PR, it's acceptable as-is but worth tracking.

  3. Abstract method signature mismatch.
    from_polytope is declared @abstractmethod with date_key: str = "date" in the base class. All 10 subclasses override it with the same signature. That's correct. However, from_polytope_reforecast is not abstract — it's concrete in the base class. That's intentional and the right design, just worth noting for anyone adding a new encoder subclass: they don't need to override from_polytope_reforecast.

  4. No docstring update in the Encoder class docstring — the methods listed in the Attributes/Methods section in __init__ still only mention from_polytope(result) (the old signature), not from_polytope_reforecast. Minor nit.


Summary

The implementation is sound, the scope decisions are well-reasoned, and the tests cover the important cases. The open items above are either pre-existing or acknowledged in the PR description. The PR is ready to merge with no blocking issues.

@awarde96 awarde96 merged commit 05a4298 into main Apr 16, 2026
125 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support "hdate" metadata field instead of "date" for reforecast datasets

3 participants