-
Notifications
You must be signed in to change notification settings - Fork 282
Add generate_OAT_SA_design() for sensitivity analysis input design #3729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
mdietze
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good overall test that this function is working correctly would be to verify that the run.write.configs module runs correctly with the OAT design as an input and continues to generate output that's readable by the SA postprocessing and graphing functions. If it doesn't, then this function is generating inputs that are not serving any useful purpose. Along the way, we may want/get to simplify the run write configs logic
| #' all inputs vary together. | ||
| #' | ||
| #' @param settings PEcAn settings object | ||
| #' @param sa_samples Optional. Pre-loaded SA samples from samples.Rdata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This argument suggests a fundamental design misunderstanding. sa_samples should be generated by this function, not read by it. I'm OK with taking this as an optional input if one wants to reuse a previous sa_samples, but if it's not provided it needs to be generated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the parameter now clearly indicates it's optional for reusing existing samples, not the expected input
| if (is.null(sa_samples)) { | ||
| samples_file <- file.path(settings$outdir, "samples.Rdata") | ||
|
|
||
| # generate samples if they don't exist (safety fallback) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be the default, not the safety fallback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed "safety fallback" comment
|
|
||
| PEcAn.uncertainty::get.parameter.samples( | ||
| settings, | ||
| ensemble.size = 1, # SA doesn't need ensemble samples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed as this is the default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed, good catch!
| settings, | ||
| ensemble.size = 1, # SA doesn't need ensemble samples | ||
| posterior.files, | ||
| ens.sample.method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also not needed as this isn't an ensemble sample
| #' @export | ||
| #' @author Akash B V | ||
| #' @importFrom rlang %||% | ||
| generate_OAT_SA_design <- function(settings, sa_samples = NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be too much at this PR, but it would be good to move away from settings as a dependency unless the underlying number of pieces of information is too large to meaningfully pass to the function. But in that case it would be good to document exactly what part of the settings is required. Here I think it might just be settings$outdir, settings$pfts, settings$sensitivity.analysis, and settings$ensemble (i.e. I wonder if you could get away with a function that has outdir, pfts, sensitivity.analysis, ensemble, and sa_samples as arguments?). Would also be good to better document semi-hidden dependencies (e.g. what does the pft$posterior.files need to point to for the function to actually sample parameters correctly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with reducing dependency of settings for input desing by passing specific argument, but after analyzing both functions to keep the architecture consistent, i found that the parameter requirements are not same; generate_OAT_SA_design needs outdir, pfts, samplingspace, sensitivity.analysis and generate_joint_ensemble_design additionally needs run$inputs (via input.ens.gen() which samples from settings$run$inputs[[input]]$path).
However there is a deeper blocker -- both functions call get.parameter.samples(settings, ...) which itself uses many settings fields (database$bety, host$name, sensitivity.analysis, etc.). So even with explicit parameters in the design functions, we'd still pass settings through to get.parameter.samples. And then that involves refactoring SDA and sobol callers.
anyways i have documented what setting it uses and semi-hidden dependiences in both desing function. I happy to know ur thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine if the parameters requirements are not the same. I also think it's fine to push the refactor of the generate design functions and get.parameter.samples to a future PR
|
── Failed tests ──────────────────────────────────────────────────────────────── |
we are passing |
|
https://github.com/PecanProject/pecan/actions/runs/20528083080/job/58974923917?pr=3729
|
| #' one-at-a-time across quantiles. This differs from ensemble design where | ||
| #' all inputs vary together. | ||
| #' | ||
| #' @param settings PEcAn settings object. This function directly uses: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider "see details" and moving the itemized list down? I find this long a @param entry makes it hard to skim all the options.
| make_mock_sa_samples <- function() { | ||
| list( | ||
| pft1 = structure( | ||
| matrix(1:9, nrow = 3, ncol = 3), | ||
| dimnames = list(c("25", "50", "75"), c("trait1", "trait2", "trait3")) | ||
| ) | ||
| ) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returns a constant -- why is this a function and not just a list?
| @@ -0,0 +1,194 @@ | |||
| make_sa_test_settings <- function() { | |||
| list( | |||
| outdir = withr::local_tempdir(), | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This path looks to me as if it will stop existing when make_sa_test_settings returns! Is the outdir used for anything during testing or could this just be "/fake/output/path/" and skip the withr usage?
| expect_true(all(result$X[[col]] == 1), | ||
| info = paste("Column", col, "should be constant 1 for SA")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?expect_true discourages using info in new code, and the expectation seems pretty clear without it to me. Could unquote for slightly better diagnostic messages:
| expect_true(all(result$X[[col]] == 1), | |
| info = paste("Column", col, "should be constant 1 for SA")) | |
| # all columns but `param` should be a constant 1 | |
| # (`!!` to get the column name into the failure message) | |
| expect_true(all(result$X[[!!col]] == 1)) |
| }) | ||
|
|
||
| test_that("generate_OAT_SA_design param column is sequential", { | ||
| settings <- make_sa_test_settings() | ||
| sa_samples <- make_mock_sa_samples() | ||
|
|
||
| result <- generate_OAT_SA_design(settings, sa_samples = sa_samples) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: Since generating the design takes several lines of setup and is identical for both these tests, multiple expectations in the same test block feels cleaner to me.
| }) | |
| test_that("generate_OAT_SA_design param column is sequential", { | |
| settings <- make_sa_test_settings() | |
| sa_samples <- make_mock_sa_samples() | |
| result <- generate_OAT_SA_design(settings, sa_samples = sa_samples) |
(If you accept this, probably want to edit line 38 to something like "...keeps param column sequential and others constant at 1")
| @@ -0,0 +1,194 @@ | |||
| make_sa_test_settings <- function() { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please match test file names to R file names by naming this file test-generate_OAT_SA_design.R, to avoid needing to wonder which test file to look in to go along with the code you're writing (or vise versa).
| sa_non_param <- setdiff(names(sa_result$X), "param") | ||
| for (col in sa_non_param) { | ||
| expect_equal(length(unique(sa_result$X[[col]])), 1, | ||
| info = "SA design: non-param columns must be constant") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this the same condition you already tested around line 45?
| info = "SA design: non-param columns must be constant") | ||
| } | ||
|
|
||
| ## ensemble design - non-param can vary (mocked to show variation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow what "to show variation" means here and why it needs all the mocks
| for (run_id in run_ids) { | ||
| expect_true(dir.exists(file.path(rundir, run_id))) | ||
| } | ||
| }) No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| }) | |
| }) | |
| #------------------ tests: OAT design with write.sa.configs ------------------- | ||
| # verifies design produces output compatible with SA postprocessing | ||
|
|
||
| test_that("OAT design integrates with write.sa.configs for SA postprocessing", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The degree of complication needed to set up these tests feels like an indicator that we could do better at designing these functions for testability, but I think that can be a future project.
originally discussed -- comment
Description
Add
generate_OAT_SA_design()function to create input design matrices for OAT sensitivity analysis.SA requires isolating the effect of each parameter by holding all other inputs (met, IC, soil) constant. The existing
generate_joint_ensemble_design()randomizes these inputs, which is correct for ensemble runs but invalidates SA variance decomposition.added tests in
test.input_design.Rto validate the OAT sensitivity design, including checks :Motivation and Context
Review Time Estimate
Types of changes
Checklist: