Skip to content

Adds support for dual-year tessera fusion; changes to grid based spatial splitting #105

Open
robknapen wants to merge 1 commit into
developfrom
feature/cy-dual-year-tessera-support
Open

Adds support for dual-year tessera fusion; changes to grid based spatial splitting #105
robknapen wants to merge 1 commit into
developfrom
feature/cy-dual-year-tessera-support

Conversation

@robknapen
Copy link
Copy Markdown
Collaborator

What does this PR do?

  • Yield Africa Dataset and pre-processing pipeline: Adds a "tessera-prev" modality (previous-year tessera embedding) so that two EO embeddings can be fused for the crop yield prediction and model interpretation. Testing the hypothesis that the previous year embedding might help when the growing season stretches across the calendar year boundary.

  • AverageEncoder, TabularEncoder: Improved handling of NaNs in data (mostly edge cases)

  • BaseDataModule: Changes spatial splitting to grid-based spatial grouping: assign each sample to a geographic cell of size spatial_split_distance_m × spatial_split_distance_m. GroupShuffleSplit then distributes whole cells across splits, so geographically close samples stay together while split proportions remain balanced (unlike DBSCAN, which chain-links dense data into a few giant clusters and produces wildly uneven splits - as was the case for the CY UC).

Please check if the DBSCAN spatial cluster needs to be reintroduced.

Before submitting

  • Did you make sure title is self-explanatory and the description concisely explains the PR?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you test your PR locally with pytest command?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant