Skip to content

brfss_functions

Danny Colombara edited this page Nov 25, 2025 · 6 revisions

BRFSS Functions

⚠️ DEPRECATED: The functions in this vignette have been migrated to apde.data. Please use that package instead.

Introduction

The Behavioral Risk Factor Surveillance System (BRFSS) is a gold mine of public health data – but like any mine, you need the right tools to extract the value. Since BRFSS is a complex survey, analyses need to account for the survey design and weights to get accurate results. The survey design includes stratification and weighting to ensure the sample represents the full population, accounting for who was more or less likely to be included in the survey. When we want to analyze multiple years together (which we often do to increase precision), we need to adjust those survey weights to avoid overestimating our population.

This vignette will show you how to easily work with King County BRFSS data while properly handling all these survey design considerations. Don’t worry - the functions do the heavy lifting for you! We’ll cover everything from finding available variables to getting properly weighted estimates.

Note that the BRFSS ETL process has its own repository and questions regarding the data should be directed to the data steward.

Load essential packages

library(rads)
library(data.table)

Checking Variable Availability Across Years

One quirk of BRFSS data is that not all questions are asked every year. Before diving into analysis, it’s helpful to check which variables are available for your time period of interest. The list_dataset_columns() function makes this easy.

Since the Washington State and King County datasets are distinct, you can specify which one you want with the kingco argument. When kingco = TRUE, you will receive the list of columns in the King County dataset. When kingco = FALSE, you will receive the list of columns in the Washington State dataset. The default is kingco = TRUE.

King County 2023 Variable Availability

vars_2023 <- list_dataset_columns("brfss", year = 2023)
head(vars_2023)
nrow(vars_2023)
var.names year(s)
addepev3 2023
age 2023
age_f 2023
age_m 2023
age5_v1 2023
age5_v2 2023
[1] 209

King County 2019-2023 Variable Availability

# Check variables across multiple years
vars_2019_2023 <- list_dataset_columns("brfss", year = 2019:2023)
head(vars_2019_2023)
nrow(vars_2019_2023)
var.names year(s)
aceindx1 2019-2021
aceindx2 2019-2021
acescor1 2019-2021
acescor2 2019-2021
addepev3 2019-2023
age 2019-2023
[1] 299

Notice that the year(s) column is not constant because BRFSS does not ask every question in every year.

Getting BRFSS Data

There are two equivalent ways to get BRFSS data: using get_data('brfss') or get_data_brfss(). Both functions will:

  1. Load the data you request into memory
  2. Automatically adjust weights if you’re analyzing multiple years
  3. Survey-set the data so it’s ready for analysis
  4. Return a dtsurvey object, which is a data.table friendly survey object analyzable with rads::calc

As with list_dataset_columns(), by default you will receive King County data. You can specify Washington State data with the kingco = FALSE argument.

Let’s see both methods in action:

Method 1: Using get_data()

This is the general interface that you can use to access any of APDE’s analytic ready data.

brfss_full <- get_data(
  dataset = "brfss",
  cols = c("chi_year", "age", "race4", "chi_sex", "prediab1"),
  year = 2019:2023
)
Your data was survey set with the following parameters is ready for rads::calc():
 - valid years = 2019-2023
 - original survey weight = `finalwt1` 
 - adjusted survey weight = `default_wt` 
 - strata = `x_ststr`

Method 2: Using get_data_brfss()

brfss_full_alt <- get_data_brfss(
  cols = c("chi_year", "age", "race4", "chi_sex", "prediab1"),
  year = 2019:2023
)
Your data was survey set with the following parameters is ready for rads::calc():
 - valid years = 2019-2023
 - original survey weight = `finalwt1` 
 - adjusted survey weight = `default_wt` 
 - strata = `x_ststr`

These methods return an identical dtsurvey object that’s ready for analysis with calc().

Notice that the functions provide an informative message regarding the survey object parameters. These will be hidden in the examples below, but are always produced when getting or survey setting BRFSS data.

Since BRFSS weights are designed to represent the population, we can verify that our multi-year weight adjustments are working properly by comparing population sizes. We expect the adjusted weights to represent an “average” population that falls between the earliest and latest years’ populations since King County’s population has been growing. Let’s verify our weight adjustments are working as expected:

Calculate the survey population at the beginning and end of the period

pop_2019 <- sum(brfss_full[chi_year == 2019]$finalwt1)
pop_2023 <- sum(brfss_full[chi_year == 2023]$finalwt1)

Calculate the adjusted population for the combined period

pop_adjusted <- sum(brfss_full$default_wt)

Is the value of the adjusted population between that for 2019 and 2023?

pop_2023 > pop_adjusted & pop_adjusted > pop_2019
[1] TRUE

Working with HRAs and Regions

BRFSS data presents a unique challenge when analyzing Health Reporting Areas (HRAs) because it comes with ZIP codes rather than HRA assignments. Since ZIP codes don’t perfectly align with HRA boundaries, we need to account for this uncertainty in our analyses.

To handle this, we use a statistical technique called multiple imputation. When you request HRA-related columns (hra20_id, hra20_name, or chi_geo_region), the function returns an imputationList object containing 10 different versions of the data. Each version represents a different possible way that ZIP codes could be assigned to HRAs based on their overlap. This approach allows us to capture the uncertainty in our geographic assignments and incorporate it into our statistical estimates.

Note: APDE decided to use 10 imputations based on an extensive empirical assessment to balance between statistical accuracy and computational efficiency. This is fixed in the ETL process and is not configurable.

Get data including HRAs

brfss_hra <- get_data_brfss(
  cols = c("chi_year", "age", "race4", "chi_sex", "prediab1", "obese", "hra20_name"),
  year = 2019:2023
)

Confim we generated an imputationList of 10 dtsurvey objects

inherits(brfss_hra, "imputationList") &
   length(brfss_hra$imputations) == 10 &
   inherits(brfss_hra$imputations[[1]], "dtsurvey")
[1] TRUE

Don’t worry if this seems complex - the calc() function automatically handles these imputationList objects.

Modifying BRFSS Data

There are times when you might need to modify BRFSS data. For example, you might want to create a new variable. Before making any modifications, first consider whether your changes should be standardized. If you’re creating variables that will be used across multiple projects (CHI, CHNA, Communities Count, etc.) or repeatedly year after year, contact the BRFSS ETL steward and politely request the addition of these changes to the analytic ready dataset.

For truly custom analyses, your modification approach will depend on whether you’re working with a simple dtsurvey object or an imputationList. Let’s look at each case:

Modifying a dtsurvey

You can modify a dtsurvey object using data.table commands without disrupting its survey settings. If you want to use dplyr commands, you may break the internals of the dtsurvey and would be wise to survey set it again following the instruction in the “Survey Setting and Creating Custom Weights” section below.

Regardless of whether you use data.table or dplyr commands, you are encouraged to create new variables as needed rather than overwriting and deleting existing ones.

Modifying an ImputationList

When working with HRA or region data, modifications become more complex since we need to maintain consistency across all 10 imputed datasets. Here’s the step-by-step example that you can follow to help you in this process:

1. Get a BRFSS ImputationList (by requesting HRA or region columns)

brfss <- get_data_brfss(
  cols = c("age", "hra20_id"),
  year = 2019:2023
)

2. Convert it to a regular dtsurvey/data.table

brfss <- as_table_brfss(brfss)
Successfully converted an imputationList to a single dtsurvey/data.table.
Remember to use as_imputed_brfss() after making modifications.

3. Create or modify a variables

brfss[, age_category := fifelse(age <67, 'working age', 'retirement age')]

4. Convert back to an ImputationList

brfss <- as_imputed_brfss(brfss)
Successfully created an imputationList with 10 imputed datasets.
Data is now ready for analysis with rads::calc().

Survey Setting and Creating Custom Weights

You might need to use pool_brfss_weights() in two scenarios:

  1. When analyzing specific years where certain questions were asked
  2. When you need to restore proper survey settings after using non-data.table commands for data manipulation

While get_data_brfss() automatically creates weights and survey sets imported data, you can create new weights and re-survey set the data using pool_brfss_weights(). Here are brief argument descriptions, see the pool_brfss_weights() help file for details:

  • ph.data: Your BRFSS dataset (can be a data.frame, data.table, dtsurvey, or imputationList)
  • years: Vector of years you want to analyze together
  • year_var: Name of the year column (defaults to ‘chi_year’)
  • old_wt_var: Name of the original weight variable (defaults to ‘finalwt1’)
  • new_wt_var: Name for your new weight variable
  • wt_method: Name of the method used to rescale your weights. Options include ‘obs’, ‘pop’, and ‘simple’ (defaults to ‘obs’)
  • strata: Name of the strata variable (defaults to ‘x_ststr’)

Let’s see it in action:

Create weights for odd years only

brfss_odd_years <- pool_brfss_weights(
  ph.data = brfss_full,
  years = c(2019, 2021),
  new_wt_var = "odd_year_wt"  # Name for the new weight variable
)

Confirm the adjusted weight is reasonable

pop_2019 <- sum(brfss_odd_years[chi_year == 2019]$finalwt1)
pop_2021 <- sum(brfss_odd_years[chi_year == 2021]$finalwt1)
pop_2019_2021 <- sum(brfss_odd_years$odd_year_wt)

pop_2019 < pop_2019_2021 & pop_2019_2021 < pop_2021
[1] TRUE

Analyzing BRFSS Data with calc()

Now for the fun part - analyzing our data! The calc() function handles all the survey design considerations for us. Let’s look at some examples:

Calculate prediabetes prevalence by sex and race (using a dtsurvey object)

prediab_by_group <- calc(
  ph.data = brfss_full,
  what = "prediab1",
  by = c("chi_sex", "race4"),
  metrics = c("mean", "rse"),
  proportion = TRUE  # Since prediab is binary
)
head(prediab_by_group)
chi_sex race4 variable mean level mean_se mean_lower mean_upper rse
Male NA prediab1 0.0503843 NA 0.0248165 0.0185461 0.1296589 49.25433
Male AIAN prediab1 0.2284831 NA 0.1021132 0.0839714 0.4889464 44.69182
Male Black prediab1 0.1187814 NA 0.0235966 0.0796585 0.1734967 19.86554
Male Asian prediab1 0.1548570 NA 0.0167577 0.1247503 0.1906468 10.82137
Male NHPI prediab1 0.2287657 NA 0.1063999 0.0810248 0.4994791 46.51043
Male Hispanic prediab1 0.1363339 NA 0.0177889 0.1050209 0.1751559 13.04807

Calculate prediabetes prevalence by HRA20 (using an imputationList)

prediab_by_hra20 <- calc(
  ph.data = brfss_hra,
  what = "prediab1",
  by = c("hra20_name"),
  metrics = c("mean", "rse"),
  proportion = TRUE
)
head(prediab_by_hra20)
hra20_name variable level mean mean_se mean_lower mean_upper rse
Auburn - North prediab1 NA 0.1097051 0.0314104 0.0473378 0.1720724 23.87216
Auburn - South prediab1 NA 0.1084990 0.0449626 0.0181626 0.1988354 31.81374
Bear Creek and Greater Sammamish prediab1 NA 0.1392452 0.0439130 0.0511826 0.2273078 23.56366
Bellevue - Central prediab1 NA 0.1382798 0.0480590 0.0419377 0.2346218 26.44273
Bellevue - Northeast prediab1 NA 0.1117511 0.0400927 0.0318859 0.1916163 28.59449
Bellevue - South prediab1 NA 0.1624832 0.0359512 0.0919890 0.2329773 21.48270

As noted in the calc() wiki, when working with an imputationList, the proportion argument is ignored. However, we include it here to maintain consistent calc() usage regardless of whether you’re working with a dtsurvey object or an imputationList.

Calculate prediabetes & obesity prevalence by HRA20 & sex (using an imputationList)

We will do this in two parts since only one value of what can be specified can be specified when ph.data is an imputationList

prediab_obese_hra20_sex <- rbind(
  calc(
    ph.data = brfss_hra,
    what = c("prediab1"),
    by = c("hra20_name", "chi_sex"),
    metrics = c("mean", "rse"),
    proportion = TRUE
  ), 
  calc(
    ph.data = brfss_hra,
    what = c("obese"),
    by = c("hra20_name", "chi_sex"),
    metrics = c("mean", "rse"),
    proportion = TRUE
  )
)

head(prediab_obese_hra20_sex)
hra20_name chi_sex variable level mean mean_se mean_lower mean_upper rse
Auburn - North Male prediab1 NA 0.0908839 0.0404166 0.0101633 0.1716045 36.37217
Auburn - North Female prediab1 NA 0.1275012 0.0521343 0.0230377 0.2319647 31.66348
Auburn - South Male prediab1 NA 0.0890185 0.0542901 -0.0206931 0.1987302 46.01217
Auburn - South Female prediab1 NA 0.1292806 0.0749252 -0.0219830 0.2805441 43.67390
Bear Creek and Greater Sammamish Male prediab1 NA 0.1473026 0.0460883 0.0566849 0.2379204 28.97528
Bear Creek and Greater Sammamish Female prediab1 NA 0.1317566 0.0646126 0.0020176 0.2614955 34.99476

Comparing calc(..., where = ...) with pool_brfss_weights

Those with experience using calc() might be wondering, “Why would we need to use pool_brfss_weights() to analyze a subset of years when we could just use the where argument in calc?” The short answer is that the methods are identical – as long as you are only interested in the mean, standard error, RSE, and confidence intervals. However, if you want to know the survey weighted number of people within a given demographic or with a condition, you need to use pool_brfss_weights(). The following example analyzing data for 2022 compares the results from the two methods.

Setting Up the Comparison

Generate 2022 Obesity prevalence:where Method

brfss_where <- get_data_brfss(cols = c('chi_year', 'obese'), year = 2019:2023)

method_where <- calc(ph.data = brfss_where,
                    what = 'obese',
                    where = chi_year == 2022,
                    metrics = c("mean", "rse", "total"),
                    proportion = TRUE )

Generate 2022 Obesity prevalence:pool_brfss_weights Method

brfss_pooled <- get_data_brfss(cols = c('chi_year', 'obese'), year = 2019:2023)

brfss_pooled <- pool_brfss_weights(ph.data = brfss_pooled, years = 2022, new_wt_var = 'wt_2022')

method_pooled <- calc(ph.data = brfss_pooled,
                    what = 'obese',
                    metrics = c("mean", "rse", "total"),
                    proportion = TRUE )

Comparing the Results

The mean, standard error, RSE, and CI values are equal

all.equal(method_where[, .(variable, mean, mean_se, mean_lower, mean_upper, rse)],
          method_pooled[, .(variable, mean, mean_se, mean_lower, mean_upper, rse)])
[1] TRUE

The total values (i.e., the survey weighted populations) differ

all.equal(method_where[, .(variable, total, total_se, total_lower, total_upper)],
          method_pooled[, .(variable, total, total_se, total_lower, total_upper)])
[1] "Column 'total': Mean relative difference: 2.30656"

Key Takeaway

The mean, standard error, RSE, and CI are identical for the two methods but the totals differ. Please remember, to get the correct survey weighted population you must use pool_brfss_weights.

Suppression & Reliability

Please refer to the APDE_SmallNumberUpdate.xlsx file on SharePoint for details.

Conclusion

Working with BRFSS data requires careful attention to survey weights and design, but the functions we’ve covered make this process straightforward. Remember:

  • Check variable availability with list_dataset_columns()
  • Get data with get_data_brfss() or get_data()
  • Modify dtsurvey objects using data.table syntax
  • Modify imputationList objects by first using as_table_brfss(), then modifying your object with data.table syntax, then converting it back with as_imputed_brfss()
  • Create custom weights if needed with pool_brfss_weights()
  • Analyze using calc()

Happy analyzing!

`Updated April 16, 2025 (rads v1.3.5)

Clone this wiki locally