Skip to content

Conversation

@infotroph
Copy link
Member

Description

Working on support for arbitrary string identifiers by changing places that abbreviate siteIDs.

Previously these places assumed the ID was coercible to numeric with billions place = server number, then dropped zeroes from the middle: 33 -> "0-33", 1000000005 -> "1-5", "3000001875" -> "3-1875", "foo" -> "NA-NA" from some fns and error from others.

Now they check whether the coercion succeeds and gives a value greater than 1e+09, and treat the siteID as a string otherwise.

Note that there are a few places with patterns like "siteid %/% 1e9" that I didn't change here:

  • 1 in shiny/ and 3 in inst/ folders, which look like they're not used often enough to bother updating right now
  • ./modules/data.remote/R/remote_process.R, which is heavily DB-dependent in other ways and it's probably reasonable for now to keep assuming all the IDs it handles come from BETY
  • modules/data.atmosphere/R/met.process.R is being addressed in Add function to Optionally get site.info if not present #3324

Motivation and Context

As we move away from requiring BETY connections, siteIDs will keep being useful as unique identifiers but need not be constrained to be numeric, and probably will be smaller than 1e9 / the billions place won't have any special significance if they're larger than that. For the initial CCMMF workflows, I've been using site names as IDs and finding they mostly Just Work. Of the changes here, I only needed the ones in pool_ic_list2netcdf today, but decided to tackle the others I saw that used the same assumption.

This does add a bit of complexity because the ID might be passed as actual numeric or as character containing digits (as read from XML).
One obvious alternate design would be to stop abbreviating at all (or move it to a step further upstream) and have all these functions use the ID exactly as passed, coercing to character if it isn't already. I considered this but thought that for backward compatibility it was worth keeping the existing behavior when running with BETY ids.

Note also that in #3324 we discussed what to do if passed a lat-lon with no siteID, and one design we considered was "generate a siteID by pasting lat and lon together". If we proceed with that design, we may want to consider potential confusion between "1-35" meaning siteID 1000000035 vs meaning a site at 1 degree north and 35 degrees west.

Review Time Estimate

  • Immediately
  • Within one week
  • When possible

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • My name is in the list of CITATION.cff
  • I agree that PEcAn Project may distribute my contribution under any or all of
    • the same license as the existing code,
    • and/or the BSD 3-clause license.
  • I have updated the CHANGELOG.md.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@infotroph infotroph requested a review from dlebauer February 3, 2025 20:48
@dlebauer
Copy link
Member

dlebauer commented Feb 4, 2025

Regarding

what to do if passed a lat-lon with no siteID,

While working through #3423 I found that digest::digest(geom) is a reproducible and effective way of generating a unique (character) id for each site.

A solution for a non-sf dataframe would be to pass paste("lat,lon") to digest, or coerce to sf pts object and digest.

@dlebauer
Copy link
Member

dlebauer commented Feb 4, 2025

1 in shiny/ and 3 in inst/ folders, which look like they're not used often enough to bother updating right now
./modules/data.remote/R/remote_process.R, which is heavily DB-dependent in other ways and it's probably reasonable for now to keep assuming all the IDs it handles come from BETY

maybe create an issue, tag it 'good first issue'?

Might also be worth replacing all instances with a helper function like

format_site_id <- function(id) {
  numeric_id <- tryCatch(as.numeric(id), warning = function(w) as.character(site_id))
  
  if (!is.na(numeric_id) && numeric_id > 1e9) {
    paste0(numeric_id %/% 1e9, "-", numeric_id %% 1e9)
  } else {
    as.character(id)
  }
}

Copy link
Member

@dlebauer dlebauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for doing this. 👍🏻

@infotroph infotroph merged commit a88fa30 into PecanProject:develop Feb 4, 2025
17 of 23 checks passed
@infotroph infotroph deleted the siteid-abbrev branch February 4, 2025 22:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants