Adds support for dual-year tessera fusion; changes to grid based spatial splitting by robknapen · Pull Request #105 · WUR-AI/aether

robknapen · 2026-05-07T15:43:04Z

What does this PR do?

Yield Africa Dataset and pre-processing pipeline: Adds a "tessera-prev" modality (previous-year tessera embedding) so that two EO embeddings can be fused for the crop yield prediction and model interpretation. Testing the hypothesis that the previous year embedding might help when the growing season stretches across the calendar year boundary.
AverageEncoder, TabularEncoder: Improved handling of NaNs in data (mostly edge cases)
BaseDataModule: Changes spatial splitting to grid-based spatial grouping: assign each sample to a geographic cell of size spatial_split_distance_m × spatial_split_distance_m. GroupShuffleSplit then distributes whole cells across splits, so geographically close samples stay together while split proportions remain balanced (unlike DBSCAN, which chain-links dense data into a few giant clusters and produces wildly uneven splits - as was the case for the CY UC).

Please check if the DBSCAN spatial cluster needs to be reintroduced.

Did you make sure title is self-explanatory and the description concisely explains the PR?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you list all the breaking changes introduced by this pull request?
Did you test your PR locally with pytest command?

Adds support for dual-year tessera fusion for the crop yield use case

d92d6d7

robknapen requested review from gabrieletijunaityte and vdplasthijs May 7, 2026 19:19