Skip to content

Replace stateful samples.Rdata logging with structured manifest file (runs_manifest.csv) #3692

@divine7022

Description

@divine7022

Currently run.write.configs acts as a "stateful" logger, appending run information to samples.Rdata. As we scale to hundreds or thousands of sites, this approach becomes inefficient and tightly couples the configuration step with the analysis step.

This issue proposes refactoring the downstream analysis functions ( e.g. read.ensemble.output, read.sa.output , etc) to adopt a stateless design that reads from a manifest, removing the dependency on runtime mutation of samples.Rdata.

run.write.configs generates run ids (e.g. ENS-0001-siteID) and physically saves them into the runs.samples list within samples.Rdata.
The samples.Rdata file grows linearly with the number of sites. The analysis modules depend on this file being constantly updated/mutated by the write step.

Proposed workaround :

Refactor the workflow to decouple "parameter definition" from "execution logging" by introducing a lightweight, Manifest file.

  1. samples.Rdata becomes static treat samples.Rdata strictly as a "Master parameter definition" file (generated upstream by get.parameter.samples). It should be immutable during the run.write.configs step.

  2. Introduce runs_manifest.csv instead of modifying the RData file, run.write.configs will generate a structured CSV file in the output directory. This file explicitly maps run ids to their design parameters.

proposed structure (runs_manifest.csv):

run_id site_id pft_name trait quantile type
ENS-0001-siteA siteA NA NA NA Ensemble
SA-median-siteA siteA grass NA 0.5 Sensitivity
SA-pft_name-T1-Q1-siteA siteA grass SLA 0.158 Sensitivity
  1. update downstream analysis functions (read.ensemble.output, read.sa.output, etc..) to read this CSV manifest.
    logic: look up the run_id where site_id == X and trait == Y

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions