Skip to content

Conversation

@Mayanknishad9
Copy link

@Mayanknishad9 Mayanknishad9 commented Dec 15, 2025

Description

fixes #3717
Could be used to fix 3707
Adds the format_try_for_ma() function to convert trait data from the external TRY database into the tabular format required by the PEcAn meta-analysis module. The function is located at modules/data.remote/R/format_try.R.

The implementation includes:

  • Data filtering: Automatically selects only trait data rows (where TraitID is not NA) and excludes covariates.
  • Name mapping: Converts TRY TraitName to PEcAn variable names (vname) using a user-provided trait_map vector.
  • Column extraction: Maps key TRY columns (StdValue, ErrorRisk, Replicates) to the PEcAn equivalents (mean, stat, n).
  • Helper function: Includes a try_trait_mapping() function with predefined conversions for common ecological traits.

Motivation and Context

This change addresses GitHub issue #3717. It creates a helper function that allows users to easily reformat data from common external trait databases (like TRY) for use in PEcAn's meta-analysis workflows. This is important because, as noted in the original issue, the internal BETYdb is no longer being actively updated, making external data sources the de facto norm for many users[citation:1].

Review Time Estimate

  • Immediately
  • Within one week
  • When possible

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • My name is in the list of CITATION.cff
  • I agree that PEcAn Project may distribute my contribution under any or all of
    • the same license as the existing code,
    • and/or the BSD 3-clause license.
  • I have updated the CHANGELOG.md.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

- Implements format_try_for_ma() function for GitHub issue PecanProject#3717
- Converts TRY database exports to PEcAn meta-analysis format
- Provides empty structure with all required columns (id, citation_id, mean, n, vname, etc.)
- Includes example_trait_mapping() for common trait conversions
- Ready for actual TRY data format specification
- Implements format_try_for_ma() function as requested
- Converts TRY database exports to PEcAn meta-analysis format
- Filters trait data (TraitID not NA) from covariates
- Maps TRY trait names to PEcAn variables via trait_map parameter
- Extracts StdValue, ErrorRisk, Replicates columns
- Returns data frame with all required PEcAn MA columns
- Includes try_trait_mapping() helper for common trait conversions
Copy link
Member

@mdietze mdietze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • need to make doc to build function man page (and then commit)
  • need to update log and overall pecan documentation
  • need to verify that this function runs correctly on real data and that the MA function reads the resulting dataframe successfully
  • should document clearly whether this data is in the format returned by the PEcAn database (which needs to be run through jagify before it can go into pecan.ma) or the format that the MA itself uses.
  • definitely take a look at the jagify and pecan.ma functions to make sure you understand which columns are used and which are currently dropped.

#' @param trait_map Named vector: TRY TraitName -> PEcAn vname
#' @param citation_id Default citation ID
#' @export
format_try_for_ma <- function(try_data, trait_map, citation_id = 999) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default citation_id is a valid citation in our database. Would it work to make the default -9999 or something like that?

result <- data.frame(
id = if ("ObsDataID" %in% names(data)) data$ObsDataID else 1:nrow(data),
citation_id = citation_id,
site_id = 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you definitely don't want to set all the data to have the same site_id, that would get analyzed incorrectly by the MA code. Is there any geospatial info in the try data (e.g. lat/lon) that you can part into a site table?

date = NA,
time = NA,
cultivar_id = NA,
specie_id = NA,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really important, don't leave as NA

treatment_id = NA,
name = NA,
date = NA,
time = NA,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does TRY really provide no info on data and time?

time = NA,
cultivar_id = NA,
specie_id = NA,
mean = if ("StdValue" %in% names(data)) as.numeric(data$StdValue) else NA,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's no actual data in the TRY file then we're in trouble! This will need to at least throw a warning if not an error

statname = if ("ErrorRisk" %in% names(data)) "SE" else NA,
stat = if ("ErrorRisk" %in% names(data)) as.numeric(data$ErrorRisk) else NA,
n = if ("Replicates" %in% names(data)) as.numeric(data$Replicates) else NA,
vname = if ("TraitName" %in% names(data)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also pretty critical!

month = NA,
greenhouse = FALSE,
control = FALSE,
stringsAsFactors = FALSE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually part of our internal format?

})
} else NA,
month = NA,
greenhouse = FALSE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you confirm that no TRY data ever comes from a greenhouse or similar treatment (potted plants, growth chambers, etc)


#' Example trait mapping
#' @export
try_trait_mapping <- c(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not have this as the default for the function?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Helper functions to format trait data from external databases (e.g. TRY) for PEcAn.MA

2 participants