Adds support for dual-year tessera fusion; changes to grid based spatial splitting #105
Open
robknapen wants to merge 1 commit into
Open
Adds support for dual-year tessera fusion; changes to grid based spatial splitting #105robknapen wants to merge 1 commit into
robknapen wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Yield Africa Dataset and pre-processing pipeline: Adds a "tessera-prev" modality (previous-year tessera embedding) so that two EO embeddings can be fused for the crop yield prediction and model interpretation. Testing the hypothesis that the previous year embedding might help when the growing season stretches across the calendar year boundary.
AverageEncoder, TabularEncoder: Improved handling of NaNs in data (mostly edge cases)
BaseDataModule: Changes spatial splitting to grid-based spatial grouping: assign each sample to a geographic cell of size spatial_split_distance_m × spatial_split_distance_m. GroupShuffleSplit then distributes whole cells across splits, so geographically close samples stay together while split proportions remain balanced (unlike DBSCAN, which chain-links dense data into a few giant clusters and produces wildly uneven splits - as was the case for the CY UC).
Please check if the DBSCAN spatial cluster needs to be reintroduced.
Before submitting
pytestcommand?