propagate_uncertainty

propagate_uncertainty()

Introduction

When comparing health indicators between populations or time periods, we often need to calculate differences or ratios between estimates while accounting for their uncertainty. For example, is life expectancy in Smallville significantly different from that in Megalopolis? Or, what is the ratio of age-adjusted mortality rates between these two communities?

The process of combining uncertainties when performing mathematical operations is called uncertainty propagation or error propagation. While simple formulas exist for basic cases, they make assumptions that often don’t hold for common public health indicators. The propagate_uncertainty() function provides a robust Monte Carlo approach that works even when traditional methods fail.

This vignette will walk you through when and how to propagate uncertainty, starting with the traditional mathematical approaches and then showing when and why you need the more flexible Monte Carlo method.

❓ How does `propagate_uncertainty()` relate to `multi_t_test()`?

Before diving into uncertainty propagation, it’s important to understand when to use this function versus the related multi_t_test() function:

Use propagate_uncertainty() when you want to combine or compare two estimates (summations, differences, ratios, etc.) and at least one has asymmetric confidence intervals or comes from specialized methods (like age-adjusted rates, exponentiated regression coefficients, or life expectancy from small populations).
Use multi_t_test() when you want to compare multiple groups against a single reference group using summary statistics, and your estimates have roughly symmetric confidence intervals that can reasonably assume normality.

In essence, propagate_uncertainty() handles the complex uncertainty propagation for two estimates, while multi_t_test() performs statistical testing across multiple groups with traditional assumptions. If you’re unsure which to use, propagate_uncertainty() is more robust as it doesn’t rely on normality assumptions.

When Traditional Methods Work (and When They Don’t)

Traditional Error Propagation Formulas

For simple cases, we can use familiar mathematical formulas to propagate uncertainty. These work well when estimates follow normal distributions and have symmetric confidence intervals.

For differences (X - Y): $$SE_{difference} = \sqrt{SE_X^2 + SE_Y^2}$$

For ratios (X / Y): $$SE_{ratio} = \frac{X}{Y} \times \sqrt{\left(\frac{SE_X}{X}\right)^2 + \left(\frac{SE_Y}{Y}\right)^2}$$

These formulas assume that X and Y are independent and normally distributed.

This works well in simple cases like comparing the means from two surveys since they would typically have symmetric confidence intervals based on a normal distribution (assuming samples with sufficient size).

For example:

# Load libraries
library(rads)
library(data.table)
library(ggplot2)

# Set the parameteres
smallville_mean <- 34.3  # Mean age in Smallville
smallville_se <- 0.2
megalopolis_mean <- 37.2 # Mean age in Megalopolis
megalopolis_se <- 0.15

# Traditional formulas for difference
diff_traditional <- smallville_mean - megalopolis_mean
se_diff_traditional <- sqrt(smallville_se^2 + megalopolis_se^2)
ci_lower_traditional <- diff_traditional - 1.96 * se_diff_traditional
ci_upper_traditional <- diff_traditional + 1.96 * se_diff_traditional

print(paste0("Traditional method - Difference: ", round(diff_traditional, 2),
            " 95% CI: (", round(ci_lower_traditional, 2), ", ", round(ci_upper_traditional, 2), ")"))

[1] "Traditional method - Difference: -2.9 95% CI: (-3.39, -2.41)"

When Traditional Methods Fail

Traditional formulas break down when dealing with:

1. Exponentiated Results from Regression Models

When you have odds ratios, rates ratios, or hazard ratios from regression, the confidence intervals are calculated on the log scale and then exponentiated. This creates asymmetric confidence intervals on the original scale.

2. Age-Adjusted Rates

The confidence intervals for age-adjusted rates are often asymmetric because they:

Use methods like Fay-Feuer that account for the Poisson nature of count data
Combine rates from different age groups with varying sample sizes
Reflect skewed distributions, especially when dealing with rare events

3. Life Expectancy

Life expectancy confidence intervals can be asymmetric, particularly when they:

Come from small populations with sparse death counts
Use bootstrap or resampling methods that preserve distributional properties
Are calculated for populations with unusual mortality patterns

However, many life expectancy estimates (including those calculated by the life_table() function in this package, which uses the WHO/Chiang method) produce roughly symmetric confidence intervals with adequate sample sizes and can use the normal distribution assumption.

4. Any Indicator with Asymmetric Confidence Intervals

If your confidence intervals aren’t roughly symmetric around the point estimate, traditional formulas will give incorrect results.

A Better Approach: Monte Carlo Simulation

For cases described above, we need a method that doesn’t assume normality or symmetry. Monte Carlo simulation works by:

Generating thousands of random draws from the uncertainty distributions of both estimates
Applying your operation (difference, ratio, etc.) to each pair of draws
Summarizing the resulting distribution to get the final estimate and confidence interval

This approach is valid regardless of the underlying distributions and automatically captures the correct uncertainty propagation.

Visualizing the Monte Carlo Approach

Let’s see how Monte Carlo simulation works by comparing it with the familiar case of estimates from normal distributions. These are condition under which traditional parametric methods work.

Define the age distributions

smallville_mean <- 34.3
smallville_se <- 0.2

megalopolis_mean <- 37.2
megalopolis_se <- 0.15

Traditional calculation of the age difference

traditional_diff <- megalopolis_mean - smallville_mean
traditional_se <- sqrt(smallville_se^2 + megalopolis_se^2)
traditional_lower <- traditional_diff - 1.96 * traditional_se
traditional_upper <- traditional_diff + 1.96 * traditional_se

Monte Carlo simulation of normal distributions based on summary statistics

set.seed(98104)
n_draws <- 10000
smallville_draws <- rnorm(n_draws, smallville_mean, smallville_se)
megalopolis_draws <- rnorm(n_draws, megalopolis_mean, megalopolis_se)
difference_draws <- megalopolis_draws - smallville_draws

Summarize Monte Carlo simulations

mc_diff <- mean(difference_draws)
mc_lower <- quantile(difference_draws, 0.025)
mc_upper <- quantile(difference_draws, 0.975)

Visualize the Monte Carlo simulations

Compare the results

	Traditional	Monte Carlo
Difference	2.90	2.901
Lower	2.41	2.417
Upper	3.39	3.396

As you can see, when the underlying distributions are normal, Monte Carlo and traditional methods give nearly identical results.

The `propagate_uncertainty()` Function

The propagate_uncertainty() function automates this Monte Carlo approach, allowing you to apply it to estimates in data.tables.

Function Parameters

Data and Column Specifications:

ph.estimates: Your data.table/data.frame with point estimates and uncertainty measures
comp_mean_col: Column name for comparator group point estimates
ref_mean_col: Column name for reference group point estimates
comp_se_col / ref_se_col: Standard error columns (when provided, used preferentially over the CI)
comp_lower_col & comp_upper_col / ref_lower_col & ref_upper_col: Confidence interval columns

Commonly Modified Parameters:

contrast_fn: Function defining your operation. Default: function(x, y) x - y
dist: Distribution assumption - "normal" or "lognormal". Default: "normal"
draws: Number of Monte Carlo draws. Default: 10,000

Infrequently Modified Parameters:

alpha: Confidence level for OUTPUT (contrast) confidence interval width. Default: 0.05 for 95% CI
input_ci_level: Confidence level of the INPUT confidence intervals, Default: 0.95 for 95% CIs
convergence_check: Whether to assess Monte Carlo convergence. Default: FALSE
h0_value: Null hypothesis value for testing. Default: auto-detected
pvalue_method: "proportion" (robust) or "ttest" (assumes normality). Default: "proportion"
use_futures: Enable parallel processing for large datasets. Default: FALSE
seed: Random seed for reproducibility. Default: 98104
se_scale: Whether standard errors are on "original" or "log" scale. Default: "original"

🚨 Critical Parameters: `contrast_fn` and `dist`

Choosing the Contrast Function (`contrast_fn`)

You can provide whatever contrast function you desire. For your convenience, here are the ones you’ll most likely want to use.

# Differences (default)
contrast_fn = function(x, y) x - y

# Ratios  
contrast_fn = function(x, y) x / y

# Percent differences
contrast_fn = function(x, y) 100 * (x - y) / y

Choosing the Distribution (`dist`)

Use dist = "normal" when comp_mean_col and ref_mean_col:

Can theoretically be negative (means, differences, log-coefficients)
Are approximately symmetric around their true value
Come from linear models or are simple means or proportions
Comparing two life expectancies from life_table() (unless very small populations)

Use dist = "lognormal" when comp_mean_col and ref_mean_col:

Must be positive (rates, counts, exponentiated coefficients)
Have right-skewed sampling distributions
Have asymmetric confidence intervals (check your specific estimates)

Note on Age-Adjusted Rates: The Fay-Feuer method actually uses the gamma distribution, but the lognormal approximation captures the right-skewed nature of the uncertainty and works well in practice when only confidence intervals are available.

Important Note: The function assumes all uncertainty can be reasonably approximated by either normal or lognormal distributions. While these cover most public health scenarios, this is still an approximation. For estimates with highly unusual uncertainty distributions (e.g., multimodal or extreme skewness), the function may not capture the true distributional complexity.

Examples: Basic Usage

Example 1: Difference in Life Expectancy with Standard Errors

Let’s compare life expectancy between Smallville and Megalopolis using the propagate_uncertainty() function:

# Create example data
life_expectancy_data <- data.table(
  city_comparison = "Megalopolis vs Smallville",
  megalopolis_le = 80.2,
  megalopolis_se = 0.15,
  smallville_le = 79.8, 
  smallville_se = 0.20
)

# Calculate the difference using propagate_uncertainty
le_result <- propagate_uncertainty(
  ph.estimates = life_expectancy_data,
  comp_mean_col = "megalopolis_le",      # Megalopolis is comparator
  comp_se_col = "megalopolis_se",
  ref_mean_col = "smallville_le",        # Smallville is reference  
  ref_se_col = "smallville_se",
  contrast_fn = function(x, y) x - y,    # Calculate difference
  dist = "normal",                       # Life expectancy can use normal
  draws = 10000,
  seed = 98104
)

city_comparison	contrast	contrast_lower	contrast_upper	contrast_se	contrast_pvalue
Megalopolis vs Smallville	0.4	-0.09	0.88	0.249	0.109

Example 2: Difference in Age-Adjusted Mortality Rates with Confidence Intervals

Now let’s work with age-adjusted mortality rates that have asymmetric confidence intervals:

# Age-adjusted mortality rates (per 100,000) with asymmetric CIs
mortality_data <- data.table(
  comparison = "City Mortality Comparison", 
  smallville_rate = 652,
  smallville_lower = 618,
  smallville_upper = 689,  # right skewed
  megalopolis_rate = 678,
  megalopolis_lower = 651,
  megalopolis_upper = 708  # right skewed
)

# Calculate the difference - note we're using lognormal distribution
mortality_result <- propagate_uncertainty(
  ph.estimates = mortality_data,
  comp_mean_col = "megalopolis_rate",   # Megalopolis is comparator
  comp_lower_col = "megalopolis_lower", 
  comp_upper_col = "megalopolis_upper",
  ref_mean_col = "smallville_rate",     # Smallville is reference  
  ref_lower_col = "smallville_lower",
  ref_upper_col = "smallville_upper", 
  contrast_fn = function(x, y) x - y,
  dist = "lognormal",                   # Better for rates with asymmetric CIs
  draws = 10000,
  seed = 98104
)

comparison	contrast	contrast_lower	contrast_upper	contrast_se	contrast_pvalue
City Mortality Comparison	25.91	-19.93	70.57	23.105	0.265

Example 3: Calculating Ratios

Often we want to calculate ratios rather than differences. To do so, just change the contrast_fn parameter value.

# Calculate the mortality rate ratio
ratio_result <- propagate_uncertainty(
  ph.estimates = mortality_data,
  comp_mean_col = "megalopolis_rate",    # Megalopolis is comparator
  comp_lower_col = "megalopolis_lower",
  comp_upper_col = "megalopolis_upper", 
  ref_mean_col = "smallville_rate",      # Smallville is reference  
  ref_lower_col = "smallville_lower",
  ref_upper_col = "smallville_upper",
  contrast_fn = function(x, y) x / y,    # Ratio instead of difference
  dist = "lognormal", 
  draws = 10000,
  seed = 98104
)

comparison	contrast	contrast_lower	contrast_upper	contrast_se	contrast_pvalue
City Mortality Comparison	1.04	0.97	1.11	0.036	0.265

Advanced Examples

Working with Multiple Comparisons

The real power of this function, besides not relying on parametric assumptions, is that you can easily batch process comparisons. For example, the following code will compare the mortality rates for four different demographics in Megalopolis compared to Smallville.

Table of Mortality Rates per 100,000

# Multiple demographic comparisons
multi_data <- data.table(
  demographic = c("Age 65+", "Age 25-64", "Female", "Male"),
  smallville_rate = c(2100, 420, 580, 720),
  smallville_lower = c(1950, 390, 540, 680),
  smallville_upper = c(2260, 455, 625, 765),
  megalopolis_rate = c(2250, 445, 615, 750),
  megalopolis_lower = c(2110, 415, 585, 715),
  megalopolis_upper = c(2400, 480, 650, 790)
)

demographic	smallville_rate	smallville_lower	smallville_upper	megalopolis_rate	megalopolis_lower	megalopolis_upper
Age 65+	2100	1950	2260	2250	2110	2400
Age 25-64	420	390	455	445	415	480
Female	580	540	625	615	585	650
Male	720	680	765	750	715	790

Table of Mortality Rate Ratios

multi_result <- propagate_uncertainty(
  ph.estimates = multi_data,
  comp_mean_col = "megalopolis_rate",
  comp_lower_col = "megalopolis_lower",
  comp_upper_col = "megalopolis_upper",
  ref_mean_col = "smallville_rate", 
  ref_lower_col = "smallville_lower",
  ref_upper_col = "smallville_upper",
  contrast_fn = function(x, y) x / y, # The critical change
  dist = "lognormal",
  draws = 10000,
  seed = 98104
)

demographic	contrast	contrast_lower	contrast_upper	contrast_se	contrast_pvalue
Age 65+	1.07	0.97	1.18	0.053	0.167
Age 25-64	1.06	0.95	1.18	0.057	0.290
Female	1.06	0.97	1.16	0.048	0.197
Male	1.04	0.96	1.13	0.041	0.305

Handling Exponentiated Regression Results

As you’ll remember, when you have odds ratios or rate ratios from regression models, those estimates have been exponentiated from the log scale. This means their confidence intervals are no longer symmetric on the original scale, making traditional error propagation formulas inappropriate.

Imagine we ran a logistic regression predicting the odds of knowing how to play saxophone across King County, using Seattle as the reference group. We want to compare the East King County effect to the North King County effect by calculating the ratio of their odds ratios. To do so, we’ll compare a manual method against using propagate_uncertainty(), showing that they produce nearly equivalent results.

Create Regression Estimates

# Odds ratios from logistic regression
sax_data <- data.table(
  comparison = "East KC OR vs North KC OR",
  east_kc_or = 1.85,        # OR for East KC vs Seattle
  east_kc_lower = 1.42,     
  east_kc_upper = 2.41,
  north_kc_or = 1.34,       # OR for North KC vs Seattle  
  north_kc_lower = 1.08,    
  north_kc_upper = 1.66
)

comparison	east_kc_or	east_kc_lower	east_kc_upper	north_kc_or	north_kc_lower	north_kc_upper
East KC OR vs North KC OR	1.85	1.42	2.41	1.34	1.08	1.66

The Hard Way (Manual Log-Scale Calculations)

# Manual calculation: convert to log scale
east_log_or <- log(sax_data$east_kc_or)
north_log_or <- log(sax_data$north_kc_or)

# Calculate approximate SEs from CIs on log scale
east_se_log <- (log(sax_data$east_kc_upper) - log(sax_data$east_kc_lower)) / (2 * 1.96)
north_se_log <- (log(sax_data$north_kc_upper) - log(sax_data$north_kc_lower)) / (2 * 1.96)

# Now that everything is on log scale, can use traditional error propagation
log_diff <- east_log_or - north_log_or
se_log_diff <- sqrt(east_se_log^2 + north_se_log^2)

# Convert back to ratio scale
manual_ratio <- exp(log_diff)
manual_lower <- exp(log_diff - 1.96 * se_log_diff)
manual_upper <- exp(log_diff + 1.96 * se_log_diff)

# Calculate p-value manually (testing if log difference != 0)
manual_z <- log_diff / se_log_diff
manual_pvalue <- 2 * (1 - pnorm(abs(manual_z)))

The Easy Way (using `propagate_uncertainty`)

easy_way_result <- propagate_uncertainty(
  ph.estimates = sax_data,
  comp_mean_col = "east_kc_or",        # East KC:Seattle OR
  comp_lower_col = "east_kc_lower",
  comp_upper_col = "east_kc_upper",
  ref_mean_col = "north_kc_or",        # North KC:Seattle OR
  ref_lower_col = "north_kc_lower", 
  ref_upper_col = "north_kc_upper",
  contrast_fn = function(x, y) x / y,  # Ratio of the two ORs
  dist = "lognormal"                   # Needed for ORs
)

Comparison: Both Methods Give the Same Answer

Method	Ratio of ORs	Lower CI	Upper CI	P-value
Manual (Log Scale)	1.38	0.98	1.94	0.064
propagate_uncertainty()	1.40	0.98	1.93	0.065

As you can see, both approaches are statistically equivalent and give nearly identical results. The propagate_uncertainty() function is primarily a convenience that reduces the risk of coding errors and handles batch processing.

Practical Applications

The propagate_uncertainty() function is particularly valuable for:

Comparing Pre-existing Estimates: When comparing health indicators between demographic groups or geographic areas from summary estimates such as those from CHI:

Life expectancy differences between racial/ethnic groups
Age-adjusted mortality rate comparisons
Hospitalization rate ratios

Data Requests: When stakeholders want to know if differences between groups are statistically significant, particularly for indicators with asymmetric confidence intervals.

Death Reports and Special Analyses: When examining mortality trends or comparing rates across populations where traditional methods would give incorrect uncertainty estimates.

Any Analysis Involving Age-Adjusted Rates: Since age-adjusted rates typically have asymmetric confidence intervals due to the Fay-Feuer method or small counts in some age groups.

Conclusion

Uncertainty propagation is a crucial but often overlooked aspect of comparing health indicators. While traditional mathematical formulas work well for simple means and proportions, they fail for some of the complex indicators we commonly use in public health.

The propagate_uncertainty() function provides a robust Monte Carlo solution that:

Works regardless of the underlying distributions
Properly handles asymmetric confidence intervals
Automatically captures the correct uncertainty propagation
Provides both point estimates and hypothesis tests

By using this function, you can confidently compare health indicators while properly accounting for uncertainty, leading to more accurate and defensible conclusions in your analyses.

Remember: if your confidence intervals aren’t roughly symmetric around your point estimates, traditional formulas will give you incorrect results. When in doubt, use propagate_uncertainty() - it will give you the right answer whether your distributions are normal or not.

– Updated October 01, 2025 (rads v1.5.0)

propagate_uncertainty

propagate_uncertainty()

Introduction

❓ How does propagate_uncertainty() relate to multi_t_test()?

When Traditional Methods Work (and When They Don’t)

Traditional Error Propagation Formulas

When Traditional Methods Fail

1. Exponentiated Results from Regression Models

2. Age-Adjusted Rates

3. Life Expectancy

4. Any Indicator with Asymmetric Confidence Intervals

A Better Approach: Monte Carlo Simulation

Visualizing the Monte Carlo Approach

The propagate_uncertainty() Function

Function Parameters

🚨 Critical Parameters: contrast_fn and dist

Choosing the Contrast Function (contrast_fn)

Choosing the Distribution (dist)

Examples: Basic Usage

Example 1: Difference in Life Expectancy with Standard Errors

Example 2: Difference in Age-Adjusted Mortality Rates with Confidence Intervals

Example 3: Calculating Ratios

Advanced Examples

Working with Multiple Comparisons

Table of Mortality Rates per 100,000

Table of Mortality Rate Ratios

Handling Exponentiated Regression Results

Create Regression Estimates

The Hard Way (Manual Log-Scale Calculations)

The Easy Way (using propagate_uncertainty)

Comparison: Both Methods Give the Same Answer

Practical Applications

Conclusion

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

❓ How does `propagate_uncertainty()` relate to `multi_t_test()`?

The `propagate_uncertainty()` Function

🚨 Critical Parameters: `contrast_fn` and `dist`

Choosing the Contrast Function (`contrast_fn`)

Choosing the Distribution (`dist`)

The Easy Way (using `propagate_uncertainty`)