Skip to content

Conversation

@Sweetdevil144
Copy link
Contributor

@Sweetdevil144 Sweetdevil144 commented Jul 5, 2024

Description

This PR aims to refactor site.id and improve str_ns. The end goal is to make the the System independent of DB. Currently, I'm refactoring to create a siteID if not present already. I'll add test cases to check this util function too.
Work still in pending

Some comments from @mdietze :

  1. Here's my slightly different interpretation :
  2. Require the user to input a dataframe with lat/lon and optionally siteID; if siteID is not provided, >construct a unique identifier from lat/lon
  3. replace str_ns with provided unique identifier
  4. Provide a helper function that takes in a BETY connection and site IDs and returns lat, lon, and str_ns

Motivation and Context

May Fix a Subtask of #3307

Review Time Estimate

  • Immediately
  • Within one week
  • When possible

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • My name is in the list of CITATION.cff
  • I have updated the CHANGELOG.md.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

In do_conversion.R

Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
@github-actions github-actions bot added the Tests label Jul 5, 2024
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
@Sweetdevil144
Copy link
Contributor Author

@meetagrawal09 can you cross check if corresponding changes in test.met.process are valid or not? :)

Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
Signed-off-by: Abhinav Pandey <abhinav.pandey.met22@itbhu.ac.in>
@infotroph
Copy link
Member

I think you're taking the wrong approach to task 2 here: Yes, "if siteID is not provided, construct a unique identifier from lat/lon", but it needs to be constructed without using the DB. As we discussed in Slack, this could be as simple as something like id_str <- paste0(lat, "_", lon).

@Sweetdevil144 Sweetdevil144 requested a review from infotroph May 13, 2025 01:28
Sweetdevil144 and others added 4 commits May 13, 2025 09:39
Signed-off-by: Abhinav Pandey <abhinavpandey1230@gmail.com>
Signed-off-by: Abhinav Pandey <abhinavpandey1230@gmail.com>
@Sweetdevil144
Copy link
Contributor Author

Fixing the CI checks and test errors

Copy link
Member

@mdietze mdietze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the examples you are using to test your code as part of development? I ask because (1) there are no testthat tests defined for your new functions and (2) there's a good bit of code here that doesn't look like it would run at all. I want to make sure your not just writing code that "makes sense" and submitting it without verifying that the individual functions work as expected under all the different cases and that the overall workflow still works

##' @return a dataframe with new site information on lat, lon and time_zone
##' @export
##' @author Abhinav Pandey
##' @importFrom digest digest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do? First, as far as I can tell this function is never used here. Second, where it is used I don't see why it can't be called by namespace

export(var_names_all)
export(workflow)
export(workflows)
importFrom(digest,digest)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not be needed

#' @export
generate_site_id <- function(lat, lon) {
latlon_str <- paste0(lat, lon)
uid <- digest::digest(latlon_str, algo = "xxhash64")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to add an entirely new package dependency to PEcAn just for this one call in one function? Is there an easier solution, or one that relies on existing packages?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't necessarily warrant using it, but that was my suggestion since I was using it in ccmmf workflows, and it was already a dependency b/c of the api.

dig <- digest::digest(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also rlang::hash

) {
site.id <- paste0(site$lat, "_", site$lon)
new.site <- data.frame(
id = as.numeric(site.id),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the as.numeric here seems like it's going to always return NA given how site.id is constructed

lat = site$lat,
lon = site$lon
)
str_ns <- paste0(new.site$lat, "_", new.site$lon)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems redundant with site.id

PEcAn.logger::logger.info(paste0("Generated siteID using lat and lon:", site.id))
site.id <- generate_site_id(site$lat, site$lon)
new.site <- data.frame(
id = as.numeric(site.id),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this id need to be numeric? If so, why? Is the algorithm generating it guaranteed to produce numbers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't be numeric if using digest as is.

lat = latlon$lat,
lon = latlon$lon)
str_ns <- paste0(new.site$id %/% 1e+09, "-", new.site$id %% 1e+09)
site.info <- PEcAn.DB::get.new.site(site, con=con, latlon = latlon)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latlon undefined

lon = latlon$lon)
# Setup site database number, lat, lon and name and copy for format.vars if new input
# Then extract new.site and str_ns from site.info
site.info <- PEcAn.DB::get.new.site(site, con=con, latlon = latlon)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latlon undefined

##' @param input Taken from settings$run$inputs. This should include id, path, and source
##' @param dir settings$database$dbfiles
##' @param overwrite Default = FALSE. whether to force ic_process to proceed
##' @param site Current site information
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is defining a new argument to the function, but the argument has't been added to the function itself

tidyverse,
withr
withr,
openssl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Package added as a dependency, but I can't find any places in your PR where it's actually used. Can this be removed?

#' @export
generate_site_id <- function(lat, lon) {
latlon_str <- paste0(lat, lon)
uid <- digest::digest(latlon_str, algo = "xxhash64")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also rlang::hash

@infotroph infotroph mentioned this pull request Oct 21, 2025
14 tasks
@infotroph
Copy link
Member

@Sweetdevil144 Thank you for the extensive work you put into this PR. It has become quite large compared to the simple idea it started from, and at the same time we've been gaining more experience running PEcAn with no database and learning how much we can simplify things by making more inputs truly required rather than trying to fill them in for you. Specifically, it's now clear that we almost always want to force the user to provide a site id rather than try to create it for them, and that in the cases we create an ID on the fly we should generally be thinking of it as a site name (i.e. something that lets a human figure out which site's in this folder) rather than guarantee its permanence or uniqueness as a global identifier.

I've taken the liberty of cherry-picking commits from this PR into a much smaller set of changes in #3656, which will close this one. I think the replacement PR captures the core ideas of this one (avoiding DB access when we know location already, allowing met download from locations with no preassigned siteid) with a much smaller footprint and better alignment with our current run setup approaches.

@Sweetdevil144
Copy link
Contributor Author

Let's close this Pr once #3656 is merged

@Sweetdevil144 Sweetdevil144 deleted the siteID-refactor branch October 22, 2025 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants