Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,10 @@ For more information about this file see also [Keep a Changelog](http://keepacha
- Support for inspecting and plotting NetCDF output variables within the notebook workflow.
- added support for soil temperature, relative humidity, soil moisture, and PPFD downscaling to `met_temporal_downscale.Gaussian_ensemble`
- The PEcAn uncertainty analysis tutorial ("Demo 2") has been updated and reimplemented as a Quarto notebook at `documentation/tutorials/Demo_02_Uncertainty_Analysis/uncertainty.qmd`. (#3570)
- Added the shared `input_design` matrix, generated via
`runModule.run.write.configs()`/`generate_joint_ensemble_design()`, that keeps
parameter draws and sampled inputs aligned across `run.write.configs()`,
`write.ensemble.configs()`(#3535, #3634, #3677).

### Fixed

Expand Down
12 changes: 11 additions & 1 deletion base/workflow/R/run.write.configs.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@
#'
#' @param settings a PEcAn settings list
#' @param ensemble.size number of ensemble runs
#' @param input_design input indices for samples
#' @param input_design data frame containing the design matrix describing parameter and input indices, as
#' documented in \code{runModule.run.write.configs()}.
#' @param write should the runs be written to the database?
#' @param posterior.files Filenames for posteriors for drawing samples for ensemble and sensitivity
#' analysis (e.g. post.distns.Rdata, or prior.distns.Rdata)
Expand All @@ -28,6 +29,15 @@
run.write.configs <- function(settings, ensemble.size, input_design, write = TRUE,
posterior.files = rep(NA, length(settings$pfts)),
overwrite = TRUE) {

# Validate that input_design matches ensemble.size
if (nrow(input_design) != ensemble.size) {
stop(
"input_design has ", nrow(input_design), " rows, but ensemble.size is ",
ensemble.size, ".The design matrix must have exactly one row for each run."
)
}

## Skip database connection if settings$database is NULL or write is False
if (!isTRUE(write) && is.null(settings$database)) {
PEcAn.logger::logger.info("Not writing this run to database, so database connection skipped")
Expand Down
23 changes: 22 additions & 1 deletion base/workflow/R/runModule.run.write.configs.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,14 @@
#'
#' @param settings a PEcAn Settings or MultiSettings object
#' @param overwrite logical: Replace config files if they already exist?
#' @param input_design the input indices for samples
#' @param input_design data.frame design matrix linking parameter draws and any
#' sampled inputs across runs. Include a `param` column whose values select
#' rows from `trait.samples`/`ensemble.samples` plus optional columns named for
#' `settings$run$inputs` tags (e.g. `met`, `soil`) with index (i.e., row number)
#' into each input's `path` list. Provide at least one row per planned run
#' (median + all SA members and/or `ensemble.size`). Usually generated by
#' `generate_joint_ensemble_design()` but custom designs may be supplied.
#' If NULL, `generate_joint_ensemble_design()` will be called internally.
#' @return A modified settings object, invisibly
#' @importFrom dplyr %>%
#' @export
Expand All @@ -24,6 +31,13 @@ runModule.run.write.configs <- function(settings,
)
input_design <- design_result$X
}

# Validate design matrix size for MultiSettings
if (!is.null(settings$ensemble$size) && nrow(input_design) != settings$ensemble$size) {
PEcAn.logger::logger.severe("Input_design has", nrow(input_design), "rows but settings$ensemble$size is",
settings$ensemble$size, ". Design matrix must have exactly one row per run.")
}

return(PEcAn.settings::papply(settings,
runModule.run.write.configs,
overwrite = FALSE,
Expand All @@ -41,6 +55,13 @@ runModule.run.write.configs <- function(settings,
)
input_design <- design_result$X
}

# Validate design matrix size for Settings
if (!is.null(settings$ensemble$size) && nrow(input_design) != settings$ensemble$size) {
PEcAn.logger::logger.severe("Input_design has", nrow(input_design), "rows but settings$ensemble$size is",
settings$ensemble$size, ". Design matrix must have exactly one row per run.")
}

ensemble_size <- nrow(input_design)


Expand Down
3 changes: 2 additions & 1 deletion base/workflow/man/run.write.configs.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 8 additions & 1 deletion base/workflow/man/runModule.run.write.configs.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

29 changes: 29 additions & 0 deletions book_source/03_topical_pages/03_pecan_xml.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -636,6 +636,35 @@ This information is currently used by the following PEcAn workflow functions:
- `PEcAn.<MODEL>::write.configs.<MODEL>` -- See [above](#pecan-write-configs)
- `PEcAn.uncertainty::run.sensitivity.analysis` -- Executes the uncertainty analysis

#### Coordinating inputs with the `input_design` design matrix {#xml-input-design}

Multi-site ensembles that sample over input files use an `input_design`
data.frame to keep parameter draws and input files aligned across runs. The
design is created up front (typically via `generate_joint_ensemble_design()`)
and passed to `runModule.run.write.configs()`. It is not saved automatically to
`samples.Rdata`, so keep your copy if you need to reuse it.
Comment on lines +644 to +645
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but flagging for improvement later: Needing this sentence is a good indicator the current behavior is unintuitive and the design should be saved somewhere by default (though not in samples.Rdata, as established in other threads).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree. Relying on the user to manually persist the design matrix is fragile.
This reminds me a potential gap fill could be done here -- #3708
Since we are moving toward a runs_manifest.csv to track run metadata (like pfts and traits, ), the natural home for the input design seems to be that same manifest. We could extend the manifest schema to include columns for the inputs (e.g. param_index, met_index, ...); that way runs_manifest.csv becomes the complete 'recipe' .
This would eliminate the need for a separate saved object entirely for that.


- **Parameter column:** `param` gives the index (i.e. row number) of the
posterior draw to use for this run. For example, `param = 5` means use the 5th
parameter sample from `samples.Rdata`.
- **Input columns:** any name that matches a tag under `run/inputs` (for
example `met`, `soil`, `veg`, `poolinitcond`). Values are indices into that
input’s `path` list. Leaving a column out keeps that input fixed across runs.
- **Row count and order:** must include exactly one row per run. For ensembles
this means `ensemble.size` rows.

Example layout (CSV or `data.frame`):

| param | met | soil |
|------:|----:|-----:|
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 1 | 2 |
| 4 | 2 | 2 |

In this example, run 2 would reuse the second parameter draw and also switch to
the second met driver while keeping the first soil file.

### Parameter Data Assimilation {#xml-parameter-data-assimilation}

The following tags can be used for parameter data assimilation. More detailed information can be found here: [Parameter Data Assimilation Documentation](#pda)
Expand Down
6 changes: 5 additions & 1 deletion modules/uncertainty/R/ensemble.R
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,11 @@ get.ensemble.samples <- function( ensemble.size, pft.samples, env.samples,
##' Given a pft.xml object, a list of lists as supplied by get.sa.samples,
##' a name to distinguish the output files, and the directory to place the files.
##'
##' @param input_design the input indices for samples
##' @param input_design design matrix describing sampled inputs (see
##' `run.write.configs()`). Columns named after `settings$run$inputs` tags give
##' 1-based indices into each input's `path` list and rows follow run order.
##' Requires `nrow(input_design) >= ensemble.size`;
##' extra rows are ignored.
##' @param ensemble.size size of ensemble
##' @param defaults pft
##' @param ensemble.samples list of lists supplied by \link{get.ensemble.samples}
Expand Down
6 changes: 5 additions & 1 deletion modules/uncertainty/man/write.ensemble.configs.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading