-
Notifications
You must be signed in to change notification settings - Fork 282
Add function to Optionally get site.info if not present
#3324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add function to Optionally get site.info if not present
#3324
Conversation
In do_conversion.R Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
|
@meetagrawal09 can you cross check if corresponding changes in |
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
|
I think you're taking the wrong approach to task 2 here: Yes, "if siteID is not provided, construct a unique identifier from lat/lon", but it needs to be constructed without using the DB. As we discussed in Slack, this could be as simple as something like |
Signed-off-by: Abhinav Pandey <abhinavpandey1230@gmail.com>
Signed-off-by: Abhinav Pandey <abhinavpandey1230@gmail.com>
|
Fixing the CI checks and test errors |
mdietze
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the examples you are using to test your code as part of development? I ask because (1) there are no testthat tests defined for your new functions and (2) there's a good bit of code here that doesn't look like it would run at all. I want to make sure your not just writing code that "makes sense" and submitting it without verifying that the individual functions work as expected under all the different cases and that the overall workflow still works
| ##' @return a dataframe with new site information on lat, lon and time_zone | ||
| ##' @export | ||
| ##' @author Abhinav Pandey | ||
| ##' @importFrom digest digest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do? First, as far as I can tell this function is never used here. Second, where it is used I don't see why it can't be called by namespace
| export(var_names_all) | ||
| export(workflow) | ||
| export(workflows) | ||
| importFrom(digest,digest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not be needed
| #' @export | ||
| generate_site_id <- function(lat, lon) { | ||
| latlon_str <- paste0(lat, lon) | ||
| uid <- digest::digest(latlon_str, algo = "xxhash64") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to add an entirely new package dependency to PEcAn just for this one call in one function? Is there an easier solution, or one that relies on existing packages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't necessarily warrant using it, but that was my suggestion since I was using it in ccmmf workflows, and it was already a dependency b/c of the api.
Line 14 in 7faa280
| dig <- digest::digest( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also rlang::hash
| ) { | ||
| site.id <- paste0(site$lat, "_", site$lon) | ||
| new.site <- data.frame( | ||
| id = as.numeric(site.id), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the as.numeric here seems like it's going to always return NA given how site.id is constructed
| lat = site$lat, | ||
| lon = site$lon | ||
| ) | ||
| str_ns <- paste0(new.site$lat, "_", new.site$lon) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems redundant with site.id
| PEcAn.logger::logger.info(paste0("Generated siteID using lat and lon:", site.id)) | ||
| site.id <- generate_site_id(site$lat, site$lon) | ||
| new.site <- data.frame( | ||
| id = as.numeric(site.id), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this id need to be numeric? If so, why? Is the algorithm generating it guaranteed to produce numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't be numeric if using digest as is.
| lat = latlon$lat, | ||
| lon = latlon$lon) | ||
| str_ns <- paste0(new.site$id %/% 1e+09, "-", new.site$id %% 1e+09) | ||
| site.info <- PEcAn.DB::get.new.site(site, con=con, latlon = latlon) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
latlon undefined
| lon = latlon$lon) | ||
| # Setup site database number, lat, lon and name and copy for format.vars if new input | ||
| # Then extract new.site and str_ns from site.info | ||
| site.info <- PEcAn.DB::get.new.site(site, con=con, latlon = latlon) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
latlon undefined
| ##' @param input Taken from settings$run$inputs. This should include id, path, and source | ||
| ##' @param dir settings$database$dbfiles | ||
| ##' @param overwrite Default = FALSE. whether to force ic_process to proceed | ||
| ##' @param site Current site information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is defining a new argument to the function, but the argument has't been added to the function itself
| tidyverse, | ||
| withr | ||
| withr, | ||
| openssl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Package added as a dependency, but I can't find any places in your PR where it's actually used. Can this be removed?
| #' @export | ||
| generate_site_id <- function(lat, lon) { | ||
| latlon_str <- paste0(lat, lon) | ||
| uid <- digest::digest(latlon_str, algo = "xxhash64") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also rlang::hash
|
@Sweetdevil144 Thank you for the extensive work you put into this PR. It has become quite large compared to the simple idea it started from, and at the same time we've been gaining more experience running PEcAn with no database and learning how much we can simplify things by making more inputs truly required rather than trying to fill them in for you. Specifically, it's now clear that we almost always want to force the user to provide a site id rather than try to create it for them, and that in the cases we create an ID on the fly we should generally be thinking of it as a site name (i.e. something that lets a human figure out which site's in this folder) rather than guarantee its permanence or uniqueness as a global identifier. I've taken the liberty of cherry-picking commits from this PR into a much smaller set of changes in #3656, which will close this one. I think the replacement PR captures the core ideas of this one (avoiding DB access when we know location already, allowing met download from locations with no preassigned siteid) with a much smaller footprint and better alignment with our current run setup approaches. |
|
Let's close this Pr once #3656 is merged |
Description
This PR aims to refactor site.id and improve str_ns. The end goal is to make the the System independent of DB. Currently, I'm refactoring to create a siteID if not present already. I'll add test cases to check this util function too.
Work still in pending
Some comments from @mdietze :
Motivation and Context
May Fix a Subtask of #3307
Review Time Estimate
Types of changes
Checklist: