-
Notifications
You must be signed in to change notification settings - Fork 282
feat: refactor SoilGrids functions to handle multiple properties #3455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from all commits
f7fd1f5
7649780
126c3b4
63a0b0b
2ecfc72
2bff74a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,111 @@ | ||
| #' Download SoilGrids raster files | ||
| #' | ||
| #' @param soil_property The soil property to download (e.g., "ocd", "clay", "sand", etc.) | ||
| #' @param depth The depth layer to download (e.g., "0-5cm", "5-15cm", etc.) | ||
| #' @param quantile The quantile to download (e.g., "mean", "Q0.05", "Q0.5", "Q0.95") | ||
| #' @param extent The extent to crop the raster to (terra::ext object) | ||
| #' @param outdir The directory to save the downloaded raster | ||
| #' @param overwrite Whether to overwrite existing files (default: FALSE) | ||
| #' @param target_crs Optional. The target coordinate reference system (default: WGS1984) | ||
| #' @return The file path of the downloaded raster | ||
| #' @export | ||
| download_soilgrids_raster <- function(soil_property, depth, quantile, extent, outdir, overwrite = FALSE, target_crs = "EPSG:4326") { | ||
| soil_properties <- if (is.vector(soil_property)) soil_property else c(soil_property) | ||
|
|
||
| for (property in soil_properties) { | ||
| vrt_url <- sprintf( | ||
| "/vsicurl/https://files.isric.org/soilgrids/latest/data/%s/%s_%s_%s.vrt", | ||
| property, property, depth, quantile | ||
| ) | ||
|
|
||
| raster <- terra::rast(vrt_url) |> | ||
| terra::crop(extent) | ||
|
|
||
| out_file <- file.path(outdir, sprintf("%s_%s_%s.tif", property, depth, quantile)) | ||
|
|
||
| if (!file.exists(out_file) || overwrite) { | ||
| terra::writeRaster(raster, out_file, overwrite = TRUE) | ||
| } | ||
| } | ||
|
|
||
| return(out_file) | ||
| } | ||
|
|
||
| #' Query SoilGrids data | ||
| #' | ||
| #' @param points A spatial object (e.g., sf or terra::vect) containing the points to query | ||
| #' @param soil_property The soil property to query (e.g., "ocd", "clay", "sand", etc.) | ||
| #' @param depth The depth layer to query (e.g., "0-5cm", "5-15cm", etc.) | ||
| #' @param quantile The quantile to query (e.g., "mean", "Q0.05", "Q0.5", "Q0.95") | ||
| #' @param local_raster Optional. The file path of a local raster to query. If NULL, queries the remote VRT. | ||
| #' @param target_crs Optional. The target coordinate reference system (default: WGS1984) | ||
| #' @return A data frame containing the queried data | ||
| #' @export | ||
| query_soilgrids <- function(points, soil_property, depth, quantile, local_raster = NULL, target_crs = "EPSG:4326") { | ||
| soil_properties <- if (is.vector(soil_property)) soil_property else c(soil_property) | ||
|
|
||
| data_list <- lapply(soil_properties, function(property) { | ||
| if (is.null(local_raster)) { | ||
| vrt_url <- sprintf( | ||
| "/vsicurl/https://files.isric.org/soilgrids/latest/data/%s/%s_%s_%s.vrt", | ||
| property, property, depth, quantile | ||
| ) | ||
| raster <- terra::rast(vrt_url) | ||
| } else { | ||
| raster <- terra::rast(local_raster) | ||
| } | ||
| terra::extract(raster, points) | ||
| }) | ||
|
|
||
| return(do.call(rbind, data_list)) | ||
| } | ||
|
|
||
| #' Estimate SoilGrids uncertainties | ||
| #' | ||
| #' @param data A data frame containing the SoilGrids data | ||
| #' @param quantiles A vector of quantiles to estimate uncertainties (default: c(0.05, 0.5, 0.95)) | ||
| #' @return A data frame containing the uncertainties | ||
| #' @export | ||
| soilgrids_uncertainty <- function(data, quantiles = c(0.05, 0.5, 0.95)) { | ||
| uncertainties <- data %>% | ||
| dplyr::group_by(site_id) %>% | ||
| dplyr::summarize( | ||
| mean = mean(value, na.rm = TRUE), | ||
| sd = sd(value, na.rm = TRUE), | ||
| # Simulate uncertainty based on mean and quantiles | ||
| simulated_values = replicate(1000, rnorm(n(), mean, sd)), # Example simulation | ||
| quantiles = sapply(quantiles, function(q) quantile(simulated_values, probs = q, na.rm = TRUE)) | ||
| ) | ||
|
|
||
| return(uncertainties) | ||
| } | ||
|
|
||
| #' Calculate SoilGrids carbon stocks | ||
| #' | ||
| #' @param soc_data A data frame containing the SoilGrids SOC data | ||
| #' @param depths A vector of depths corresponding to the SOC data | ||
| #' @param thickness A vector of thicknesses for each depth layer | ||
| #' @param segments A list of depth segments (e.g., list(c(0, 30), c(30, 70), c(70, 150))) | ||
| #' @param coarse_fraction Optional. A vector of coarse fractions to include in the calculation | ||
| #' @return A data frame containing the calculated carbon stocks for specified segments | ||
| #' @export | ||
| soilgrids_carbonstocks <- function(soc_data, depths, thickness, segments, coarse_fraction = NULL) { | ||
| carbonstocks <- data.frame(site_id = unique(soc_data$site_id)) | ||
|
|
||
| for (segment in segments) { | ||
| lower_bound <- segment[1] | ||
| upper_bound <- segment[2] | ||
|
|
||
| total_soc <- soc_data %>% | ||
| dplyr::filter(depths >= lower_bound & depths < upper_bound) %>% | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is “soc_data” soil organic carbon content (SOC, in g/kg) or organic carbon density (OCD, in kg/m³)? I guess it is OCD if you multiply it by thickness (see https://www.isric.org/explore/soilgrids/faq-soilgrids). For total_SOC, it should be something like "total_SOC=sum(OCDthickness(1-coarse_fraction))" but can you explain your calculation here?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The [ This accounts for the volume of soil represented by the thickness and adjusts for the coarse fraction to provide an accurate estimate of total SOC.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| dplyr::summarize( | ||
| total_soc = sum(value * thickness[depths >= lower_bound & depths < upper_bound], na.rm = TRUE) + | ||
| if (!is.null(coarse_fraction)) sum(coarse_fraction * thickness[depths >= lower_bound & depths < upper_bound], na.rm = TRUE) else 0 | ||
| ) | ||
|
|
||
| carbonstocks <- dplyr::bind_cols(carbonstocks, total_soc) | ||
| colnames(carbonstocks)[ncol(carbonstocks)] <- paste0("total_soc_", lower_bound, "_", upper_bound) | ||
| } | ||
|
|
||
| return(carbonstocks) | ||
| } | ||

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the calculation of uncertainty here is correct or not. Our current calculations will first simulate the uncertainty of soil properties across the vertical profile using the
meanand different quantiles from SoilGrids directly and then summarize the uncertainty once we have a good fit on that (Seepecan/modules/data.land/R/soilgrids_soc_extraction.R
Line 123 in 72e09bc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @Qianyuxuan might have better opinions on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused about how you calculate sd here: "sd = sd(value, na.rm = TRUE)" with known quantiles and means. Usually, we estimate means and sds based on the best fits. For organic carbon density (ocd), it was found to follow gamma distribution best based on reported means and quantiles. But for soil texture data (sand, clay, and silt fractions), they follow Dirichlet distribution and estimate of corresponding parameters can be found in my pending PR here: https://github.com/PecanProject/pecan/pull/3406/files#diff-8b26bae8174687f7e2bf631c5188b979a5b8e4bdacc5c604da84e09a2abc0058
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading the current code correctly, it's grouping by site, and thus calculating a mean and sd over depth, which is wrong on many levels (pun intended). Sampling from rnorm is also wrong. This code is in no way a "refactor" of the existing code