-
Notifications
You must be signed in to change notification settings - Fork 282
Small bug fixes and updates for the PR #3634. #3690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
|
||
| generate_joint_ensemble_design <- function(settings, | ||
| ensemble_size, | ||
| gen.samples = FALSE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- style suggestion: call the arg generate_samples instead of gen.samples
- My instinct is that the default should be TRUE -- at least in the forward run, this is intended to be the single place samples are generated, so skipping it should need explicit user input
- What should happen when generate_samples is TRUE but samples.Rdata already exists? I'd favor skipping it rather than overwriting.
| gen.samples = FALSE, | |
| generate_samples = TRUE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
| # get the joint input design. | ||
| # here we are looping over sites | ||
| # to make sure we are grabbing the complete input lists. | ||
| for (i in seq_along(settings)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not following the logic here. First, why is this in a loop over sites? Second, you've added a lot of checks to make sure that there's a complete mapping between the input space and the samplingspace, but you're not returning any information to the user if that's not the case. Related, I'm not following the expectation that if the samplingspace and inputs don't match for the first site that we should just keep looping over sites -- why would the input space change by site, and if it did you'd still have the problem that the input_design would work for some sites but not others.
I'll also note that it's not obvious where the input_design is being saved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The
for-loopover sites is because under some edge cases where some sites may not have the entire input lists, which are required in thegenerate_joint_ensemble_designfunction. For example, we should haveparameters,ICs,met, andsoil_physicsas inputs, but some sites may not havesoil_physics. Therefore, we will not have the complete inputs' index table if we use any of those sites to run thegenerate_joint_ensemble_designfunction. - I don't think we need to return any information for sites that don't have any of the above inputs, except for the
metinputs, for which the user will see the error in the SIPNET logfile. It's natural for some sites that are missing any of the inputs, cause the datasets that are used for creating those files are not available everywhere on Earth. - The SDA will fail only if any site doesn't have the met files. Otherwise, the workflow should just work as expected. Even for some sites that don't have some inputs (
soil_physicsorICs), the workflow should just pick the default value of SIPNET and not fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree. If a site doesn't have multiple soil_physics files, but then the input_design tells ensemble member i to run with soil_physics file k, then the write.configs is going to crash when it tries to access a non-existent file path. I'm also not following why we'd be doing an ensemble run where some sites were missing inputs, that seems like something that should throw an error not something we should create a workaround for. The behavior of silently reverting to the default SIPNET values is a dangerous one -- those defaults are not reliable general-purpose values but the parameter values specific to Niwot Ridge, CO. And just because you've created that workaround in SIPNET doesn't mean the same workaround exists in the write.configs for other models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I just reverted all the changes to the SDA workflow. The error will be reported in the write.config function if there is any mismatch between the samplingspace and input lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't mean that the checks should be dropped -- I actually thought that the checks were a useful feature. It seems better to me to catch a mismatch here during the setup than after spawning 100 x 8000 runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. I included code that checks for any mismatch between the sampling space and the site's input list. If there is any mismatch, the user will know which site is missing which input from the mis.match.table.
| } | ||
| } | ||
| # if we generated new samples file within the `generate_joint_ensemble_design` function. | ||
| if (generate_samples) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not following the logic here either. It seems like you're counting on generate_joint_ensemble_design to generate a samples.Rdata. If that's the case, why would you this be conditional on generate_samples being TRUE? I'm not following why there would be a case where you would want to use the old ensemble.samples? And even if you did, why does the loading of that need to be determined conditionally before caling generate_joint_ensemble_design. There's a lot of unspoken/undocumented assumptions here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we discussed it before that we both think the get.parameter.samples should be executed within the generate_joint_ensemble_design function. And that's why I think we should be careful about whether we should regenerate the samples.Rdata, followed by some checks in this PR.
The logic of this change of code is as follows:
- Detect if we pass the
ensemble.samplesfrom outside (check if the ensemble.samples object isNULLor not). - If we don't pass the
ensemble.samples(ensemble.samples == NULL), we will need to detect if asamples.Rdataexists under the desired directory (settings$outdir). - If we don't have the
samples.Rdatain the desired directory, we will then set thegenerate_samplesasTRUE, because we need to generate thesamples.Rdatain thegenerate_joint_ensemble_designfunction. - If we generate new
samples.Rdatain thegenerate_joint_ensemble_designfunction, we will need to load thesamples.Rdatafile after running thegenerate_joint_ensemble_designfunction.
For the question I'm not following why there would be a case where you would want to use the old ensemble.samples?, I think the answer is I have to keep using the same ensemble.samples passed outside of the SDA workflow otherwise we will generate 40 different samples.Rdata files if I submit 40 jobs for the SDA.
Finally, for the question why does the loading of that need to be determined conditionally before calling generate_joint_ensemble_design, I have changed the code to make sure we only load the samples.Rdata once after running the generate_joint_ensemble_design function. And still, we need to check if we have already passed the ensemble.samples to determine if we want to load the new parameters instead of the existing object.
| #' | ||
| #' @param settings A PEcAn settings object containing ensemble configuration | ||
| #' @param ensemble_size Integer specifying the number of ensemble members | ||
| #' @param generate_samples Logical: logical variable determine if we want to generate the samples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's unclear what this argument is intended to do as the whole point of the function is to generate input design samples -- why would this ever be false? What would that mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The generate_samples will be false if we don't need to generate the samples.Rdata file (e.g., we already have the samples.Rdata file or we pass the ensemble.samples outside of the SDA function).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we always set the generate_samples as TRUE, we will generate 40 different samples.Rdata files if we submit 40 separate SDA jobs to the cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that's also true of the input_design! The issue is that generate_joint_ensemble_design should only be called once, not that generate_joint_ensemble_design should be called 40 times with the same fixed parameter samples. I'm still not seeing the scenario when you'd need the proposed generate_samples argument to be false.
| # find a site that has all registered inputs except for the parameter field. | ||
| if (all(names.sampler %in% names.site.input)) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty if, plus checks catch that names.site.input is out of scope here. Can this line be deleted?
| # find a site that has all registered inputs except for the parameter field. | |
| if (all(names.sampler %in% names.site.input)) {} |
In this PR, I added a new logical argument
gen.samplesin thegenerate_joint_ensemble_designfunction. The flaggen.sampleswill be determined within the SDA main workflow so that the samples file will be generated only if we don't pass theensemble.samplesand we don't have thesamples.Rdatafile in the desired directory.Beyond this, I also updated the
change logand document to include the new continental SDA features and settings.Description
Motivation and Context
Review Time Estimate
Types of changes
Checklist: