Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 111 additions & 0 deletions modules/data.land/R/soilgrids_utils.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
#' Download SoilGrids raster files
#'
#' @param soil_property The soil property to download (e.g., "ocd", "clay", "sand", etc.)
#' @param depth The depth layer to download (e.g., "0-5cm", "5-15cm", etc.)
#' @param quantile The quantile to download (e.g., "mean", "Q0.05", "Q0.5", "Q0.95")
#' @param extent The extent to crop the raster to (terra::ext object)
#' @param outdir The directory to save the downloaded raster
#' @param overwrite Whether to overwrite existing files (default: FALSE)
#' @param target_crs Optional. The target coordinate reference system (default: WGS1984)
#' @return The file path of the downloaded raster
#' @export
download_soilgrids_raster <- function(soil_property, depth, quantile, extent, outdir, overwrite = FALSE, target_crs = "EPSG:4326") {
soil_properties <- if (is.vector(soil_property)) soil_property else c(soil_property)

for (property in soil_properties) {
vrt_url <- sprintf(
"/vsicurl/https://files.isric.org/soilgrids/latest/data/%s/%s_%s_%s.vrt",
property, property, depth, quantile
)

raster <- terra::rast(vrt_url) |>
terra::crop(extent)

out_file <- file.path(outdir, sprintf("%s_%s_%s.tif", property, depth, quantile))

if (!file.exists(out_file) || overwrite) {
terra::writeRaster(raster, out_file, overwrite = TRUE)
}
}

return(out_file)
}

#' Query SoilGrids data
#'
#' @param points A spatial object (e.g., sf or terra::vect) containing the points to query
#' @param soil_property The soil property to query (e.g., "ocd", "clay", "sand", etc.)
#' @param depth The depth layer to query (e.g., "0-5cm", "5-15cm", etc.)
#' @param quantile The quantile to query (e.g., "mean", "Q0.05", "Q0.5", "Q0.95")
#' @param local_raster Optional. The file path of a local raster to query. If NULL, queries the remote VRT.
#' @param target_crs Optional. The target coordinate reference system (default: WGS1984)
#' @return A data frame containing the queried data
#' @export
query_soilgrids <- function(points, soil_property, depth, quantile, local_raster = NULL, target_crs = "EPSG:4326") {
soil_properties <- if (is.vector(soil_property)) soil_property else c(soil_property)

data_list <- lapply(soil_properties, function(property) {
if (is.null(local_raster)) {
vrt_url <- sprintf(
"/vsicurl/https://files.isric.org/soilgrids/latest/data/%s/%s_%s_%s.vrt",
property, property, depth, quantile
)
raster <- terra::rast(vrt_url)
} else {
raster <- terra::rast(local_raster)
}
terra::extract(raster, points)
})

return(do.call(rbind, data_list))
}

#' Estimate SoilGrids uncertainties
#'
#' @param data A data frame containing the SoilGrids data
#' @param quantiles A vector of quantiles to estimate uncertainties (default: c(0.05, 0.5, 0.95))
#' @return A data frame containing the uncertainties
#' @export
soilgrids_uncertainty <- function(data, quantiles = c(0.05, 0.5, 0.95)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the calculation of uncertainty here is correct or not. Our current calculations will first simulate the uncertainty of soil properties across the vertical profile using the mean and different quantiles from SoilGrids directly and then summarize the uncertainty once we have a good fit on that (See

cgamma <- function(theta, val, stat) {
).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @Qianyuxuan might have better opinions on that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about how you calculate sd here: "sd = sd(value, na.rm = TRUE)" with known quantiles and means. Usually, we estimate means and sds based on the best fits. For organic carbon density (ocd), it was found to follow gamma distribution best based on reported means and quantiles. But for soil texture data (sand, clay, and silt fractions), they follow Dirichlet distribution and estimate of corresponding parameters can be found in my pending PR here: https://github.com/PecanProject/pecan/pull/3406/files#diff-8b26bae8174687f7e2bf631c5188b979a5b8e4bdacc5c604da84e09a2abc0058

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading the current code correctly, it's grouping by site, and thus calculating a mean and sd over depth, which is wrong on many levels (pun intended). Sampling from rnorm is also wrong. This code is in no way a "refactor" of the existing code

uncertainties <- data %>%
dplyr::group_by(site_id) %>%
dplyr::summarize(
mean = mean(value, na.rm = TRUE),
sd = sd(value, na.rm = TRUE),
# Simulate uncertainty based on mean and quantiles
simulated_values = replicate(1000, rnorm(n(), mean, sd)), # Example simulation
quantiles = sapply(quantiles, function(q) quantile(simulated_values, probs = q, na.rm = TRUE))
)

return(uncertainties)
}

#' Calculate SoilGrids carbon stocks
#'
#' @param soc_data A data frame containing the SoilGrids SOC data
#' @param depths A vector of depths corresponding to the SOC data
#' @param thickness A vector of thicknesses for each depth layer
#' @param segments A list of depth segments (e.g., list(c(0, 30), c(30, 70), c(70, 150)))
#' @param coarse_fraction Optional. A vector of coarse fractions to include in the calculation
#' @return A data frame containing the calculated carbon stocks for specified segments
#' @export
soilgrids_carbonstocks <- function(soc_data, depths, thickness, segments, coarse_fraction = NULL) {
carbonstocks <- data.frame(site_id = unique(soc_data$site_id))

for (segment in segments) {
lower_bound <- segment[1]
upper_bound <- segment[2]

total_soc <- soc_data %>%
dplyr::filter(depths >= lower_bound & depths < upper_bound) %>%
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is “soc_data” soil organic carbon content (SOC, in g/kg) or organic carbon density (OCD, in kg/m³)? I guess it is OCD if you multiply it by thickness (see https://www.isric.org/explore/soilgrids/faq-soilgrids). For total_SOC, it should be something like "total_SOC=sum(OCDthickness(1-coarse_fraction))" but can you explain your calculation here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The soc_data refers to organic carbon density (OCD) in kg/m³. The calculation for total_SOC is performed as follows:

[
\text{total_SOC} = \sum(\text{OCD} \times \text{thickness} \times (1 - \text{coarse_fraction}))
]

This accounts for the volume of soil represented by the thickness and adjusts for the coarse fraction to provide an accurate estimate of total SOC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot from 2025-02-26 23-51-42

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. your math seems right, but that math isn't what's implemented in the code
  2. I'm not sure the "if" will work as intended for vector data
  3. The "if" assumes that if the coarse fraction is NULL then it's 0. Setting aside that this might not be a valid assumption, it would be far simpler to do that substitution earlier in the pipe

dplyr::summarize(
total_soc = sum(value * thickness[depths >= lower_bound & depths < upper_bound], na.rm = TRUE) +
if (!is.null(coarse_fraction)) sum(coarse_fraction * thickness[depths >= lower_bound & depths < upper_bound], na.rm = TRUE) else 0
)

carbonstocks <- dplyr::bind_cols(carbonstocks, total_soc)
colnames(carbonstocks)[ncol(carbonstocks)] <- paste0("total_soc_", lower_bound, "_", upper_bound)
}

return(carbonstocks)
}
Loading