-
Notifications
You must be signed in to change notification settings - Fork 282
Add generate_OAT_SA_design() for sensitivity analysis input design #3729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
divine7022
wants to merge
21
commits into
PecanProject:develop
Choose a base branch
from
divine7022:generate-sa-design
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
6810331
function for sa specific desing
divine7022 41c5d99
update .Rd
divine7022 162e6ad
tests for input design
divine7022 875a7f5
update changelog
divine7022 932f0ab
update NEWS.md
divine7022 2a8ac94
update NAMESPACE
divine7022 e2f3dd9
update comment
divine7022 f2e8193
update roxy
divine7022 ab3dcc7
update .Rd
divine7022 6b4c4dd
updade CHANGELOG.md
divine7022 0fbf2cd
update NEWS.md
divine7022 2bc446b
removed comment
divine7022 24a608f
fix SA design to generate samples by default
divine7022 7c65f43
verifies OAT design with SA post processing
divine7022 7f5b3d7
update roxy
divine7022 de90a59
update roxy
divine7022 f75e64e
update generate_OAT_SA_design.Rd
divine7022 f22c103
update generate_joint_ensemble_design.Rd
divine7022 8574b58
add withr to DESCRIPTION
divine7022 9c6fc58
add withr to docker depends
divine7022 ac5519c
Merge remote-tracking branch 'origin/develop' into generate-sa-design
divine7022 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,151 @@ | ||
| #' Generate One-At-a-Time (OAT) input design for sensitivity analysis | ||
| #' | ||
| #' Creates an input design matrix for sensitivity analysis where non-parameter | ||
| #' inputs (met, IC, soil, etc.) are held constant while parameters vary | ||
| #' one-at-a-time across quantiles. This differs from ensemble design where | ||
| #' all inputs vary together. | ||
| #' | ||
| #' @param settings PEcAn settings object. This function directly uses: | ||
| #' \itemize{ | ||
| #' \item \code{settings$outdir} - Output directory path for samples.Rdata | ||
| #' \item \code{settings$pfts} - List of PFTs (extracts \code{posterior.files}) | ||
| #' \item \code{settings$ensemble$samplingspace} - Input types to include in design | ||
| #' } | ||
| #' When \code{sa_samples = NULL}, settings is passed to | ||
| #' \code{\link{get.parameter.samples}} which additionally requires: | ||
| #' \itemize{ | ||
| #' \item \code{settings$sensitivity.analysis} - SA quantile configuration | ||
| #' \item \code{settings$database$bety} - Database connection (optional) | ||
| #' \item \code{settings$host$name} - Host name for dbfile.check (optional) | ||
| #' } | ||
| #' @param sa_samples Optional. Pre-loaded SA samples (named list with one | ||
| #' element per PFT, each a matrix with quantiles as rows and traits as columns). | ||
| #' If NULL (default), samples are generated via \code{get.parameter.samples}. | ||
| #' | ||
| #' @return list with component X: a data.frame with columns for each input type | ||
| #' and one row per SA run. Non-parameter columns are all 1 (constant). | ||
| #' | ||
| #' @details | ||
| #' For sensitivity analysis, we must isolate the effect of each | ||
| #' parameter by holding all other inputs constant. The param column contains | ||
| #' sequential indices (1, 2, 3, ...) matching the SA run order in | ||
| #' \code{write.sa.configs}. All other columns (met, ic, soil, etc.) are set to 1, | ||
| #' meaning the first input file is always used. | ||
| #' | ||
| #' Note on internal dependencies | ||
| #' | ||
| #' If sa_samples is NULL we hand off to get.parameter.samples(), which does | ||
| #' the work of finding and loading parameter distributions. | ||
| #' | ||
| #' In practice it: | ||
| #' - uses pft$posterior.files directly when it is defined (an Rdata file with | ||
| #' post.distns or prior.distns), | ||
| #' - otherwise figures out an output directory from pft$outdir or, if needed, | ||
| #' via pft$posteriorid in the database, | ||
| #' - then looks in that directory for post.distns.Rdata, falling back to | ||
| #' prior.distns.Rdata, | ||
| #' - and, for MCMC posteriors, looks up trait.mcmc*.Rdata linked to the same | ||
| #' posteriorid or a trait.mcmc.Rdata file in that directory. | ||
| #' | ||
| #' @examples | ||
| #' \dontrun{ | ||
| #' # Generate SA design for a multi-site run | ||
| #' sa_design <- generate_OAT_SA_design(settings) | ||
| #' | ||
| #' # View the design matrix | ||
| #' print(sa_design$X) | ||
| #' # param met ic soil | ||
| #' # 1 1 1 1 1 # Median run | ||
| #' # 2 2 1 1 1 # trait1 @ q=2.3% | ||
| #' # 3 3 1 1 1 # trait1 @ q=15.9% | ||
| #' # 4 4 1 1 1 # trait1 @ q=84.1% | ||
| #' # ... | ||
| #' | ||
| #' # With pre-loaded sa_samples (skips get.parameter.samples call) | ||
| #' load("samples.Rdata") | ||
| #' sa_design <- generate_OAT_SA_design(settings, sa_samples = sa.samples) | ||
| #' } | ||
| #' @export | ||
| #' @author Akash B V | ||
| #' @importFrom rlang %||% | ||
| generate_OAT_SA_design <- function(settings, sa_samples = NULL) { | ||
|
|
||
| samples_file <- file.path(settings$outdir, "samples.Rdata") | ||
|
|
||
| if (is.null(sa_samples)) { | ||
|
|
||
| posterior.files <- settings$pfts %>% | ||
| purrr::map_chr("posterior.files", .default = NA_character_) | ||
|
|
||
| # generate parameter samples - sa.samples created from quantiles | ||
| PEcAn.uncertainty::get.parameter.samples( | ||
| settings, | ||
| posterior.files = posterior.files | ||
| ) | ||
|
|
||
| samples_env <- new.env() | ||
| load(samples_file, envir = samples_env) | ||
| sa_samples <- samples_env$sa.samples | ||
|
|
||
| if (is.null(sa_samples)) { | ||
| PEcAn.logger::logger.severe( | ||
| "sa.samples not found in samples.Rdata.", | ||
| "Ensure sensitivity.analysis is configured in settings." | ||
| ) | ||
| } | ||
| } | ||
|
|
||
| # calculate total number of SA runs | ||
| # 1 median + (traits * non-median quantiles) per PFT | ||
| MEDIAN <- "50" | ||
| num_sa_runs <- 1 # start with median run | ||
|
|
||
| for (pft_name in names(sa_samples)) { | ||
| if (pft_name == "env") next | ||
|
|
||
| pft_samples <- sa_samples[[pft_name]] | ||
| n_traits <- ncol(pft_samples) | ||
| quantile_names <- rownames(pft_samples) | ||
| n_non_median <- sum(quantile_names != MEDIAN) | ||
|
|
||
| # add runs for this pft: (traits) * (non-median quantiles) | ||
| num_sa_runs <- num_sa_runs + (n_traits * n_non_median) | ||
| } | ||
|
|
||
| # get input types from samplingspace | ||
| samp <- settings$ensemble$samplingspace | ||
| input_types <- names(samp) | ||
| input_types[input_types == "parameters"] <- "param" | ||
|
|
||
| if (!"param" %in% input_types) { | ||
| input_types <- c("param", input_types) | ||
| } | ||
|
|
||
| # build design matrix | ||
| # key difference from ensemble design: | ||
| # - ensemble: all columns get random/quasi-random indices | ||
| # - SA (OAT): param column = sequential index, ALL other columns = 1 | ||
| # | ||
| # the "1" means: use the FIRST (and only) input file for that type. | ||
| # this ensures all SA runs use the SAME met, same ic, etc. | ||
|
|
||
| design_list <- list() | ||
|
|
||
| for (input_type in input_types) { | ||
| if (input_type == "param") { | ||
| # sequential indices map to SA run order | ||
| # 1 = median run | ||
| # 2 = first (pft, trait, quantile) combination | ||
| # 3 = second (pft, trait, quantile) combination | ||
| # ... | ||
| design_list[[input_type]] <- seq_len(num_sa_runs) | ||
| } else { | ||
| # all other inputs constant(always use first input file) | ||
| design_list[[input_type]] <- rep(1L, num_sa_runs) | ||
| } | ||
| } | ||
|
|
||
| design_matrix <- data.frame(design_list) | ||
|
|
||
| return(list(X = design_matrix)) | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be too much at this PR, but it would be good to move away from
settingsas a dependency unless the underlying number of pieces of information is too large to meaningfully pass to the function. But in that case it would be good to document exactly what part of the settings is required. Here I think it might just be settings$outdir, settings$pfts, settings$sensitivity.analysis, and settings$ensemble (i.e. I wonder if you could get away with a function that has outdir, pfts, sensitivity.analysis, ensemble, and sa_samples as arguments?). Would also be good to better document semi-hidden dependencies (e.g. what does the pft$posterior.files need to point to for the function to actually sample parameters correctly)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with reducing dependency of settings for input desing by passing specific argument, but after analyzing both functions to keep the architecture consistent, i found that the parameter requirements are not same;
generate_OAT_SA_designneeds outdir, pfts, samplingspace, sensitivity.analysis andgenerate_joint_ensemble_designadditionally needs run$inputs (via input.ens.gen() which samples fromsettings$run$inputs[[input]]$path).However there is a deeper blocker -- both functions call
get.parameter.samples(settings, ...)which itself uses many settings fields (database$bety, host$name, sensitivity.analysis, etc.). So even with explicit parameters in the design functions, we'd still pass settings through to get.parameter.samples. And then that involves refactoring SDA and sobol callers.anyways i have documented what setting it uses and semi-hidden dependiences in both desing function. I happy to know ur thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine if the parameters requirements are not the same. I also think it's fine to push the refactor of the generate design functions and get.parameter.samples to a future PR