-
Notifications
You must be signed in to change notification settings - Fork 282
Add TRY database formatter for meta-analysis (Issue #3717) #3720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Add TRY database formatter for meta-analysis (Issue #3717) #3720
Conversation
- Implements format_try_for_ma() function for GitHub issue PecanProject#3717 - Converts TRY database exports to PEcAn meta-analysis format - Provides empty structure with all required columns (id, citation_id, mean, n, vname, etc.) - Includes example_trait_mapping() for common trait conversions - Ready for actual TRY data format specification
- Implements format_try_for_ma() function as requested - Converts TRY database exports to PEcAn meta-analysis format - Filters trait data (TraitID not NA) from covariates - Maps TRY trait names to PEcAn variables via trait_map parameter - Extracts StdValue, ErrorRisk, Replicates columns - Returns data frame with all required PEcAn MA columns - Includes try_trait_mapping() helper for common trait conversions
mdietze
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- need to make doc to build function man page (and then commit)
- need to update log and overall pecan documentation
- need to verify that this function runs correctly on real data and that the MA function reads the resulting dataframe successfully
- should document clearly whether this data is in the format returned by the PEcAn database (which needs to be run through
jagifybefore it can go intopecan.ma) or the format that the MA itself uses. - definitely take a look at the jagify and pecan.ma functions to make sure you understand which columns are used and which are currently dropped.
| #' @param trait_map Named vector: TRY TraitName -> PEcAn vname | ||
| #' @param citation_id Default citation ID | ||
| #' @export | ||
| format_try_for_ma <- function(try_data, trait_map, citation_id = 999) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default citation_id is a valid citation in our database. Would it work to make the default -9999 or something like that?
| result <- data.frame( | ||
| id = if ("ObsDataID" %in% names(data)) data$ObsDataID else 1:nrow(data), | ||
| citation_id = citation_id, | ||
| site_id = 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you definitely don't want to set all the data to have the same site_id, that would get analyzed incorrectly by the MA code. Is there any geospatial info in the try data (e.g. lat/lon) that you can part into a site table?
| date = NA, | ||
| time = NA, | ||
| cultivar_id = NA, | ||
| specie_id = NA, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really important, don't leave as NA
| treatment_id = NA, | ||
| name = NA, | ||
| date = NA, | ||
| time = NA, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does TRY really provide no info on data and time?
| time = NA, | ||
| cultivar_id = NA, | ||
| specie_id = NA, | ||
| mean = if ("StdValue" %in% names(data)) as.numeric(data$StdValue) else NA, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's no actual data in the TRY file then we're in trouble! This will need to at least throw a warning if not an error
| statname = if ("ErrorRisk" %in% names(data)) "SE" else NA, | ||
| stat = if ("ErrorRisk" %in% names(data)) as.numeric(data$ErrorRisk) else NA, | ||
| n = if ("Replicates" %in% names(data)) as.numeric(data$Replicates) else NA, | ||
| vname = if ("TraitName" %in% names(data)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also pretty critical!
| month = NA, | ||
| greenhouse = FALSE, | ||
| control = FALSE, | ||
| stringsAsFactors = FALSE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually part of our internal format?
| }) | ||
| } else NA, | ||
| month = NA, | ||
| greenhouse = FALSE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you confirm that no TRY data ever comes from a greenhouse or similar treatment (potted plants, growth chambers, etc)
|
|
||
| #' Example trait mapping | ||
| #' @export | ||
| try_trait_mapping <- c( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to not have this as the default for the function?
Description
fixes #3717
Could be used to fix 3707
Adds the
format_try_for_ma()function to convert trait data from the external TRY database into the tabular format required by the PEcAn meta-analysis module. The function is located atmodules/data.remote/R/format_try.R.The implementation includes:
TraitIDis notNA) and excludes covariates.TraitNameto PEcAn variable names (vname) using a user-providedtrait_mapvector.StdValue,ErrorRisk,Replicates) to the PEcAn equivalents (mean,stat,n).try_trait_mapping()function with predefined conversions for common ecological traits.Motivation and Context
This change addresses GitHub issue #3717. It creates a helper function that allows users to easily reformat data from common external trait databases (like TRY) for use in PEcAn's meta-analysis workflows. This is important because, as noted in the original issue, the internal BETYdb is no longer being actively updated, making external data sources the de facto norm for many users[citation:1].
Review Time Estimate
Types of changes
Checklist: