-
Notifications
You must be signed in to change notification settings - Fork 1
propagate_uncertainty
When comparing health indicators between populations or time periods, we often need to calculate differences or ratios between estimates while accounting for their uncertainty. For example, is life expectancy in Smallville significantly different from that in Megalopolis? Or, what is the ratio of age-adjusted mortality rates between these two communities?
The process of combining uncertainties when performing mathematical
operations is called uncertainty propagation or error
propagation. While simple formulas exist for basic cases, they make
assumptions that often don’t hold for common public health indicators.
The propagate_uncertainty() function provides a robust Monte Carlo
approach that works even when traditional methods fail.
This vignette will walk you through when and how to propagate uncertainty, starting with the traditional mathematical approaches and then showing when and why you need the more flexible Monte Carlo method.
Before diving into uncertainty propagation, it’s important to understand
when to use this function versus the related
multi_t_test() function:
-
Use
propagate_uncertainty()when you want to combine or compare two estimates (summations, differences, ratios, etc.) and at least one has asymmetric confidence intervals or comes from specialized methods (like age-adjusted rates, exponentiated regression coefficients, or life expectancy from small populations). -
Use
multi_t_test()when you want to compare multiple groups against a single reference group using summary statistics, and your estimates have roughly symmetric confidence intervals that can reasonably assume normality.
In essence, propagate_uncertainty() handles the complex uncertainty
propagation for two estimates, while multi_t_test() performs
statistical testing across multiple groups with traditional assumptions.
If you’re unsure which to use, propagate_uncertainty() is more robust
as it doesn’t rely on normality assumptions.
For simple cases, we can use familiar mathematical formulas to propagate uncertainty. These work well when estimates follow normal distributions and have symmetric confidence intervals.
For differences (X - Y):
For ratios (X / Y):
These formulas assume that X and Y are independent and normally distributed.
This works well in simple cases like comparing the means from two surveys since they would typically have symmetric confidence intervals based on a normal distribution (assuming samples with sufficient size).
For example:
# Load libraries
library(rads)
library(data.table)
library(ggplot2)
# Set the parameteres
smallville_mean <- 34.3 # Mean age in Smallville
smallville_se <- 0.2
megalopolis_mean <- 37.2 # Mean age in Megalopolis
megalopolis_se <- 0.15
# Traditional formulas for difference
diff_traditional <- smallville_mean - megalopolis_mean
se_diff_traditional <- sqrt(smallville_se^2 + megalopolis_se^2)
ci_lower_traditional <- diff_traditional - 1.96 * se_diff_traditional
ci_upper_traditional <- diff_traditional + 1.96 * se_diff_traditional
print(paste0("Traditional method - Difference: ", round(diff_traditional, 2),
" 95% CI: (", round(ci_lower_traditional, 2), ", ", round(ci_upper_traditional, 2), ")"))[1] "Traditional method - Difference: -2.9 95% CI: (-3.39, -2.41)"
Traditional formulas break down when dealing with:
When you have odds ratios, rates ratios, or hazard ratios from regression, the confidence intervals are calculated on the log scale and then exponentiated. This creates asymmetric confidence intervals on the original scale.
The confidence intervals for age-adjusted rates are often asymmetric because they:
- Use methods like Fay-Feuer that account for the Poisson nature of count data
- Combine rates from different age groups with varying sample sizes
- Reflect skewed distributions, especially when dealing with rare events
Life expectancy confidence intervals can be asymmetric, particularly when they:
- Come from small populations with sparse death counts
- Use bootstrap or resampling methods that preserve distributional properties
- Are calculated for populations with unusual mortality patterns
However, many life expectancy estimates (including those calculated by
the life_table() function in this package, which uses the WHO/Chiang
method) produce roughly symmetric confidence intervals with adequate
sample sizes and can use the normal distribution assumption.
If your confidence intervals aren’t roughly symmetric around the point estimate, traditional formulas will give incorrect results.
For cases described above, we need a method that doesn’t assume normality or symmetry. Monte Carlo simulation works by:
- Generating thousands of random draws from the uncertainty distributions of both estimates
- Applying your operation (difference, ratio, etc.) to each pair of draws
- Summarizing the resulting distribution to get the final estimate and confidence interval
This approach is valid regardless of the underlying distributions and automatically captures the correct uncertainty propagation.
Let’s see how Monte Carlo simulation works by comparing it with the familiar case of estimates from normal distributions. These are condition under which traditional parametric methods work.
Define the age distributions
smallville_mean <- 34.3
smallville_se <- 0.2
megalopolis_mean <- 37.2
megalopolis_se <- 0.15Traditional calculation of the age difference
traditional_diff <- megalopolis_mean - smallville_mean
traditional_se <- sqrt(smallville_se^2 + megalopolis_se^2)
traditional_lower <- traditional_diff - 1.96 * traditional_se
traditional_upper <- traditional_diff + 1.96 * traditional_seMonte Carlo simulation of normal distributions based on summary statistics
set.seed(98104)
n_draws <- 10000
smallville_draws <- rnorm(n_draws, smallville_mean, smallville_se)
megalopolis_draws <- rnorm(n_draws, megalopolis_mean, megalopolis_se)
difference_draws <- megalopolis_draws - smallville_drawsSummarize Monte Carlo simulations
mc_diff <- mean(difference_draws)
mc_lower <- quantile(difference_draws, 0.025)
mc_upper <- quantile(difference_draws, 0.975)Visualize the Monte Carlo simulations

Compare the results
| Traditional | Monte Carlo | |
|---|---|---|
| Difference | 2.90 | 2.901 |
| Lower | 2.41 | 2.417 |
| Upper | 3.39 | 3.396 |
As you can see, when the underlying distributions are normal, Monte Carlo and traditional methods give nearly identical results.
The propagate_uncertainty() function automates this Monte Carlo
approach, allowing you to apply it to estimates in data.tables.
Data and Column Specifications:
-
ph.estimates: Your data.table/data.frame with point estimates and uncertainty measures -
comp_mean_col: Column name for comparator group point estimates -
ref_mean_col: Column name for reference group point estimates -
comp_se_col/ref_se_col: Standard error columns (when provided, used preferentially over the CI) -
comp_lower_col&comp_upper_col/ref_lower_col&ref_upper_col: Confidence interval columns
Commonly Modified Parameters:
-
contrast_fn: Function defining your operation. Default:function(x, y) x - y -
dist: Distribution assumption -"normal"or"lognormal". Default:"normal" -
draws: Number of Monte Carlo draws. Default:10,000
Infrequently Modified Parameters:
-
alpha: Confidence level for OUTPUT (contrast) confidence interval width. Default:0.05for 95% CI -
input_ci_level: Confidence level of the INPUT confidence intervals, Default:0.95for 95% CIs -
convergence_check: Whether to assess Monte Carlo convergence. Default:FALSE -
h0_value: Null hypothesis value for testing. Default: auto-detected -
pvalue_method:"proportion"(robust) or"ttest"(assumes normality). Default:"proportion" -
use_futures: Enable parallel processing for large datasets. Default:FALSE -
seed: Random seed for reproducibility. Default:98104 -
se_scale: Whether standard errors are on"original"or"log"scale. Default:"original"
You can provide whatever contrast function you desire. For your convenience, here are the ones you’ll most likely want to use.
# Differences (default)
contrast_fn = function(x, y) x - y
# Ratios
contrast_fn = function(x, y) x / y
# Percent differences
contrast_fn = function(x, y) 100 * (x - y) / yUse dist = "normal" when comp_mean_col and ref_mean_col:
- Can theoretically be negative (means, differences, log-coefficients)
- Are approximately symmetric around their true value
- Come from linear models or are simple means or proportions
- Comparing two life expectancies from
life_table()(unless very small populations)
Use dist = "lognormal" when comp_mean_col and ref_mean_col:
- Must be positive (rates, counts, exponentiated coefficients)
- Have right-skewed sampling distributions
- Have asymmetric confidence intervals (check your specific estimates)
Note on Age-Adjusted Rates: The Fay-Feuer method actually uses the gamma distribution, but the lognormal approximation captures the right-skewed nature of the uncertainty and works well in practice when only confidence intervals are available.
Important Note: The function assumes all uncertainty can be reasonably approximated by either normal or lognormal distributions. While these cover most public health scenarios, this is still an approximation. For estimates with highly unusual uncertainty distributions (e.g., multimodal or extreme skewness), the function may not capture the true distributional complexity.
Let’s compare life expectancy between Smallville and Megalopolis using
the propagate_uncertainty() function:
# Create example data
life_expectancy_data <- data.table(
city_comparison = "Megalopolis vs Smallville",
megalopolis_le = 80.2,
megalopolis_se = 0.15,
smallville_le = 79.8,
smallville_se = 0.20
)
# Calculate the difference using propagate_uncertainty
le_result <- propagate_uncertainty(
ph.estimates = life_expectancy_data,
comp_mean_col = "megalopolis_le", # Megalopolis is comparator
comp_se_col = "megalopolis_se",
ref_mean_col = "smallville_le", # Smallville is reference
ref_se_col = "smallville_se",
contrast_fn = function(x, y) x - y, # Calculate difference
dist = "normal", # Life expectancy can use normal
draws = 10000,
seed = 98104
)| city_comparison | contrast | contrast_lower | contrast_upper | contrast_se | contrast_pvalue |
|---|---|---|---|---|---|
| Megalopolis vs Smallville | 0.4 | -0.09 | 0.88 | 0.249 | 0.109 |
Now let’s work with age-adjusted mortality rates that have asymmetric confidence intervals:
# Age-adjusted mortality rates (per 100,000) with asymmetric CIs
mortality_data <- data.table(
comparison = "City Mortality Comparison",
smallville_rate = 652,
smallville_lower = 618,
smallville_upper = 689, # right skewed
megalopolis_rate = 678,
megalopolis_lower = 651,
megalopolis_upper = 708 # right skewed
)
# Calculate the difference - note we're using lognormal distribution
mortality_result <- propagate_uncertainty(
ph.estimates = mortality_data,
comp_mean_col = "megalopolis_rate", # Megalopolis is comparator
comp_lower_col = "megalopolis_lower",
comp_upper_col = "megalopolis_upper",
ref_mean_col = "smallville_rate", # Smallville is reference
ref_lower_col = "smallville_lower",
ref_upper_col = "smallville_upper",
contrast_fn = function(x, y) x - y,
dist = "lognormal", # Better for rates with asymmetric CIs
draws = 10000,
seed = 98104
)| comparison | contrast | contrast_lower | contrast_upper | contrast_se | contrast_pvalue |
|---|---|---|---|---|---|
| City Mortality Comparison | 25.91 | -19.93 | 70.57 | 23.105 | 0.265 |
Often we want to calculate ratios rather than differences. To do so,
just change the contrast_fn parameter value.
# Calculate the mortality rate ratio
ratio_result <- propagate_uncertainty(
ph.estimates = mortality_data,
comp_mean_col = "megalopolis_rate", # Megalopolis is comparator
comp_lower_col = "megalopolis_lower",
comp_upper_col = "megalopolis_upper",
ref_mean_col = "smallville_rate", # Smallville is reference
ref_lower_col = "smallville_lower",
ref_upper_col = "smallville_upper",
contrast_fn = function(x, y) x / y, # Ratio instead of difference
dist = "lognormal",
draws = 10000,
seed = 98104
)| comparison | contrast | contrast_lower | contrast_upper | contrast_se | contrast_pvalue |
|---|---|---|---|---|---|
| City Mortality Comparison | 1.04 | 0.97 | 1.11 | 0.036 | 0.265 |
The real power of this function, besides not relying on parametric assumptions, is that you can easily batch process comparisons. For example, the following code will compare the mortality rates for four different demographics in Megalopolis compared to Smallville.
# Multiple demographic comparisons
multi_data <- data.table(
demographic = c("Age 65+", "Age 25-64", "Female", "Male"),
smallville_rate = c(2100, 420, 580, 720),
smallville_lower = c(1950, 390, 540, 680),
smallville_upper = c(2260, 455, 625, 765),
megalopolis_rate = c(2250, 445, 615, 750),
megalopolis_lower = c(2110, 415, 585, 715),
megalopolis_upper = c(2400, 480, 650, 790)
)| demographic | smallville_rate | smallville_lower | smallville_upper | megalopolis_rate | megalopolis_lower | megalopolis_upper |
|---|---|---|---|---|---|---|
| Age 65+ | 2100 | 1950 | 2260 | 2250 | 2110 | 2400 |
| Age 25-64 | 420 | 390 | 455 | 445 | 415 | 480 |
| Female | 580 | 540 | 625 | 615 | 585 | 650 |
| Male | 720 | 680 | 765 | 750 | 715 | 790 |
multi_result <- propagate_uncertainty(
ph.estimates = multi_data,
comp_mean_col = "megalopolis_rate",
comp_lower_col = "megalopolis_lower",
comp_upper_col = "megalopolis_upper",
ref_mean_col = "smallville_rate",
ref_lower_col = "smallville_lower",
ref_upper_col = "smallville_upper",
contrast_fn = function(x, y) x / y, # The critical change
dist = "lognormal",
draws = 10000,
seed = 98104
)| demographic | contrast | contrast_lower | contrast_upper | contrast_se | contrast_pvalue |
|---|---|---|---|---|---|
| Age 65+ | 1.07 | 0.97 | 1.18 | 0.053 | 0.167 |
| Age 25-64 | 1.06 | 0.95 | 1.18 | 0.057 | 0.290 |
| Female | 1.06 | 0.97 | 1.16 | 0.048 | 0.197 |
| Male | 1.04 | 0.96 | 1.13 | 0.041 | 0.305 |
As you’ll remember, when you have odds ratios or rate ratios from regression models, those estimates have been exponentiated from the log scale. This means their confidence intervals are no longer symmetric on the original scale, making traditional error propagation formulas inappropriate.
Imagine we ran a logistic regression predicting the odds of knowing how
to play saxophone across King County, using Seattle as the reference
group. We want to compare the East King County effect to the North King
County effect by calculating the ratio of their odds ratios. To do so,
we’ll compare a manual method against using propagate_uncertainty(),
showing that they produce nearly equivalent results.
# Odds ratios from logistic regression
sax_data <- data.table(
comparison = "East KC OR vs North KC OR",
east_kc_or = 1.85, # OR for East KC vs Seattle
east_kc_lower = 1.42,
east_kc_upper = 2.41,
north_kc_or = 1.34, # OR for North KC vs Seattle
north_kc_lower = 1.08,
north_kc_upper = 1.66
)| comparison | east_kc_or | east_kc_lower | east_kc_upper | north_kc_or | north_kc_lower | north_kc_upper |
|---|---|---|---|---|---|---|
| East KC OR vs North KC OR | 1.85 | 1.42 | 2.41 | 1.34 | 1.08 | 1.66 |
# Manual calculation: convert to log scale
east_log_or <- log(sax_data$east_kc_or)
north_log_or <- log(sax_data$north_kc_or)
# Calculate approximate SEs from CIs on log scale
east_se_log <- (log(sax_data$east_kc_upper) - log(sax_data$east_kc_lower)) / (2 * 1.96)
north_se_log <- (log(sax_data$north_kc_upper) - log(sax_data$north_kc_lower)) / (2 * 1.96)
# Now that everything is on log scale, can use traditional error propagation
log_diff <- east_log_or - north_log_or
se_log_diff <- sqrt(east_se_log^2 + north_se_log^2)
# Convert back to ratio scale
manual_ratio <- exp(log_diff)
manual_lower <- exp(log_diff - 1.96 * se_log_diff)
manual_upper <- exp(log_diff + 1.96 * se_log_diff)
# Calculate p-value manually (testing if log difference != 0)
manual_z <- log_diff / se_log_diff
manual_pvalue <- 2 * (1 - pnorm(abs(manual_z)))easy_way_result <- propagate_uncertainty(
ph.estimates = sax_data,
comp_mean_col = "east_kc_or", # East KC:Seattle OR
comp_lower_col = "east_kc_lower",
comp_upper_col = "east_kc_upper",
ref_mean_col = "north_kc_or", # North KC:Seattle OR
ref_lower_col = "north_kc_lower",
ref_upper_col = "north_kc_upper",
contrast_fn = function(x, y) x / y, # Ratio of the two ORs
dist = "lognormal" # Needed for ORs
)| Method | Ratio of ORs | Lower CI | Upper CI | P-value |
|---|---|---|---|---|
| Manual (Log Scale) | 1.38 | 0.98 | 1.94 | 0.064 |
| propagate_uncertainty() | 1.40 | 0.98 | 1.93 | 0.065 |
As you can see, both approaches are statistically equivalent and give
nearly identical results. The propagate_uncertainty() function is
primarily a convenience that reduces the risk of coding errors and
handles batch processing.
The propagate_uncertainty() function is particularly valuable for:
Comparing Pre-existing Estimates: When comparing health indicators between demographic groups or geographic areas from summary estimates such as those from CHI:
- Life expectancy differences between racial/ethnic groups
- Age-adjusted mortality rate comparisons
- Hospitalization rate ratios
Data Requests: When stakeholders want to know if differences between groups are statistically significant, particularly for indicators with asymmetric confidence intervals.
Death Reports and Special Analyses: When examining mortality trends or comparing rates across populations where traditional methods would give incorrect uncertainty estimates.
Any Analysis Involving Age-Adjusted Rates: Since age-adjusted rates typically have asymmetric confidence intervals due to the Fay-Feuer method or small counts in some age groups.
Uncertainty propagation is a crucial but often overlooked aspect of comparing health indicators. While traditional mathematical formulas work well for simple means and proportions, they fail for some of the complex indicators we commonly use in public health.
The propagate_uncertainty() function provides a robust Monte Carlo
solution that:
- Works regardless of the underlying distributions
- Properly handles asymmetric confidence intervals
- Automatically captures the correct uncertainty propagation
- Provides both point estimates and hypothesis tests
By using this function, you can confidently compare health indicators while properly accounting for uncertainty, leading to more accurate and defensible conclusions in your analyses.
Remember: if your confidence intervals aren’t roughly symmetric around
your point estimates, traditional formulas will give you incorrect
results. When in doubt, use propagate_uncertainty() - it will give you
the right answer whether your distributions are normal or not.
– Updated October 01, 2025 (rads v1.5.0)