Skip to content

get_population

Danny Colombara edited this page Nov 25, 2025 · 20 revisions

⚠️ DEPRECATED: The functions in this vignette have been migrated to apde.data. Please use that package instead.

Introduction

This vignette will provide some examples of ways to pull population data into R from the Azure cloud.

The population numbers are estimated by the WA Office of Financial Management (OFM) population unit. OFM produces two sets of estimates: (1) April 1 official population estimates for cities and towns and (2) Small Area Estimates (SAE) for smaller geographies. The get_population() function pulls the SAE numbers and, when round = T, should be the same as those in CHAT.

NOTE!! To get the most out of this vignette, we highly recommend that you actually type each and every bit of code into R. Doing so will almost definitely help you learn the syntax much faster than just reading the vignette or copying and pasting the code.

get_population arguments

Arguments are the values that we send to a function when it is called. Generally, typing args(my_function_of_interest) will return the possible arguments including any defaults. For example,

args(get_population)
## function (kingco = T, years = NA, ages = c(0:100), genders = c("f", 
##     "m"), races = c("aian", "asian", "black", "hispanic", "multiple", 
##     "nhpi", "white"), race_type = c("race_eth"), geo_type = c("kc"), 
##     group_by = NULL, round = FALSE, mykey = "hhsaw", census_vintage = 2020, 
##     geo_vintage = 2020, schema = "ref", table_prefix = "pop_geo_", 
##     return_query = FALSE) 
## NULL

The standard arguments for get_population() are found in the its help file (?get_population), and summarized here for your convenience:

  1. kingco << Logical vector of length 1. Identifies whether you want population estimates limited to King County. Only impacts results for geo_type in c('blk', blkgrp', 'lgd', 'scd', 'tract', 'zip'). Default == TRUE.

  2. years << Numeric vector. Identifies which year(s) of data should be pulled. Default == 2022.

  3. ages << Numeric vector. Identifies which age(s) should be pulled. Default == c(0:100), with 100 being the top coded value for 100:120.

  4. genders << Character vector of length 1 or 2. Identifies gender(s) should be pulled. The acceptable values are 'f', 'female', 'm', and 'male'. Default == c('f', 'm').

  5. races << Character vector of length 1 to 7. Identifies which race(s) or ethnicity should be pulled. The acceptable values are "aian", "asian", "black", "hispanic", "multiple", "nhpi", and "white". Default == all the possible values.

  6. race_type << Character vector of length 1. Identifies whether to pull race data with Hispanic as an ethnicity ("race") or Hispanic as a race ("race_eth"). Default == c("race_eth").

  7. geo_type << Character vector of length 1. Identifies the geographic level for which you want population estimates. The acceptable values are: 'blk', 'blkgrp', 'county', 'hra', 'kc', 'lgd' (WA State legislative districts), 'region', 'seattle', 'scd' (school districts), 'tract', and 'zip'. Default == "kc".

  8. group_by << Character vector of length 0 to 7. Identifies how you would like the data 'grouped' (i.e., stratified). Valid options are limited to: "years", "ages", "genders", "race", "race_eth", "fips_co", and "geo_id". Default == NULL, i.e., estimates are only grouped / aggregated by geography (e.g. geo_id is always included).

  9. round << Logical vector of length 1. Identifies whether or not population estimates should be returned as whole numbers. Default == FALSE.

  10. mykey << a character vector with the name of the keyring:: key that provides access to the Health and Human Services Analytic Workspace (HHSAW). If you have never set your keyring before and or do not know what this is referring to, just type keyring::key_set('hhsaw', username = 'ALastname@kingcounty.gov') into your R console (making sure to replace the username). The default is 'hhsaw'. Note that it can also take the name of a live database connection.

  11. census_vintage << Either 2010 or 2020. Specifies the anchor census of the desired estimates. Default is 2020

  12. geo_vintage << Either 2010 or 2020. Specifies the anchor census for geographies. For example, 2020 will return geographies based on 2020 blocks. Default is 2020

  13. schema << Unless you are a power user, don't mess with this

  14. table_prefix << Unless you are a power user, don't mess with this

  15. return_query << logical. Rather than returning results, the query/queries used to fetch the results are provided

There is no need to specify any or all of the arguments listed above. As the following example shows, the default arguments for get_population provide the overall most recent year's estimated King County population.

get_population()[]
pop geo_type geo_id year age gender race_eth
2378100 kc King County 2024 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White

Example analyses

Note 1: The use of head() below is not necessary. It is a convenience function that displays the first 6 rows of data and was used to keep the output in this vignette tidy.

Note 2: The use of [] after get_population() is used to print the output to the console. Typically, you would not print the results but would save them as an object. E.g., my.pop.est <- get_population().

Geographic estimates

WA

get_population(geo_type = 'wa', 
               round = TRUE)[]
pop geo_type geo_id year age gender race_eth geo_id_code
8035700 wa Washington State 2024 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White 53

King County

get_population(round = TRUE)[]
pop geo_type geo_id year age gender race_eth
2378100 kc King County 2024 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White

King County Regions

get_population(geo_type = c("region"),
               group_by = c("geo_id"),
               round = TRUE)[]
pop geo_type geo_id year age gender race_eth geo_id_code
614564 region East 2024 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White 1
821706 region South 2024 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White 4
797700 region Seattle 2024 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White 3
144130 region North 2024 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White 2

King County Regions with round=FALSE

Turn off rounding to get the exact (fractional) number of people estimated.

rads::get_population(geo_type = 'region', 
                     round = FALSE)[]
pop geo_type geo_id year age gender race_eth geo_id_code
614563.9 region East 2024 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White 1
821706.1 region South 2024 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White 4
797700.0 region Seattle 2024 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White 3
144130.0 region North 2024 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White 2

King County HRAs

head(get_population(geo_type = c("hra"), 
                    group_by = c("geo_id"))[])  
pop geo_type geo_id year age gender race_eth geo_id_code
40640.25 hra Bellevue - Central 2024 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White 36
40896.51 hra Bellevue - Northeast 2024 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White 40
35054.98 hra Seattle - South Beacon Hill, Georgetown, and South Park 2024 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White 30
22000.00 hra Covington 2024 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White 11
61910.00 hra Shoreline 2024 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White 61
50566.25 hra Bear Creek and Greater Sammamish 2024 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White 54

King County Zip codes

head(get_population(geo_type = c("zip"), 
                    group_by = c("geo_id"))[])  
pop geo_type geo_id year age gender race_eth
24569.014 zip 98178 2024 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
29092.852 zip 98011 2024 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
26185.765 zip 98075 2024 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
44086.439 zip 98125 2024 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
28503.065 zip 98102 2024 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
3981.577 zip 98051 2024 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White

King County Census Tracts

head(get_population(geo_type = c("tract"), 
                    group_by = c("geo_id"), 
                    ages = 18, 
                    census_vintage = 2020, 
                    geo_vintage = 2020)[])  
pop geo_type geo_id year age gender race_eth
27.97442 tract 53033001300 2024 18 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
102.56269 tract 53033030005 2024 18 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
32.93070 tract 53033020401 2024 18 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
34.35188 tract 53033025806 2024 18 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
57.21224 tract 53033024001 2024 18 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
34.70830 tract 53033011500 2024 18 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White

King County Census Block Groups

head(get_population(geo_type = c("blkgrp"), 
                    group_by = c("geo_id"), 
                    ages = 18,
                    census_vintage = 2020, 
                    geo_vintage = 2020)[])  
pop geo_type geo_id year age gender race_eth
31.445280 blkgrp 530330309024 2024 18 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
12.074794 blkgrp 530330106011 2024 18 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
22.764878 blkgrp 530330255003 2024 18 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
12.027419 blkgrp 530330252012 2024 18 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
3.713155 blkgrp 530330061004 2024 18 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
12.242667 blkgrp 530330227031 2024 18 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White

King County Census Blocks

#ages added to make things go faster
head(get_population(geo_type = c("blk"), 
                    group_by = c("geo_id"), 
                    ages = 18, 
                    census_vintage = 2020, 
                    geo_vintage = 2020)[])  
pop geo_type geo_id year age gender race_eth
0.6037123 blk 530330254023018 2024 18 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
5.1092879 blk 530330044021004 2024 18 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
0.6744307 blk 530330268012000 2024 18 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
1.4181219 blk 530330279011007 2024 18 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
2.7855590 blk 530330312025016 2024 18 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
0.1404601 blk 530330053051010 2024 18 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White

Other simple arguments

King County multiple years combined

get_population(years = 2017:2019)[]
pop geo_type geo_id year age gender race_eth
6565125 kc King County 2017-2019 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White

King County multiple years stratified

get_population(years = 2017:2019, 
               group_by = "years")[]
pop geo_type geo_id year age gender race_eth
2149910 kc King County 2017 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
2227755 kc King County 2019 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
2187460 kc King County 2018 0-100 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White

King County multiple ages combined

get_population(ages = 65:70)[]
pop geo_type geo_id year age gender race_eth
132119.3 kc King County 2024 65-70 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White

King County multiple ages stratified

get_population(ages = 65:70, group_by = "ages")[]
pop geo_type geo_id year age gender race_eth
21677.85 kc King County 2024 68 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
23299.48 kc King County 2024 66 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
24062.47 kc King County 2024 65 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
20725.29 kc King County 2024 69 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
19775.33 kc King County 2024 70 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
22578.88 kc King County 2024 67 Female, Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White

King County female only

get_population(genders = "F")[]
pop geo_type geo_id year age gender race_eth
1182773 kc King County 2024 0-100 Female AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White

King County gender stratified

get_population(group_by = "genders")[]
pop geo_type geo_id year age gender race_eth
1195327 kc King County 2024 0-100 Male AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White
1182773 kc King County 2024 0-100 Female AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White

King County AIAN (not Hispanic)

get_population(races = "aian", 
               race_type = "race_eth")[]
pop geo_type geo_id year age gender race_eth
11284 kc King County 2024 0-100 Female, Male AIAN

King County AIAN (regardless of Hispanic ethnicity)

get_population(races = "aian", 
               race_type = "race", 
               group_by = 'race')[]
pop geo_type geo_id year age gender race
27123 kc King County 2024 0-100 Female, Male AIAN

King County stratified by Hispanic as race

get_population(race_type = "race_eth", 
               group_by = "race_eth")[]
pop geo_type geo_id year age gender race_eth
1230939 kc King County 2024 0-100 Female, Male White
11284 kc King County 2024 0-100 Female, Male AIAN
165584 kc King County 2024 0-100 Female, Male Multiple race
161584 kc King County 2024 0-100 Female, Male Black
22478 kc King County 2024 0-100 Female, Male NHPI
272996 kc King County 2024 0-100 Female, Male Hispanic
513235 kc King County 2024 0-100 Female, Male Asian

King County stratified by race (Hispanic as ethnicity)

get_population(race_type = "race", 
               group_by = "race")[]
pop geo_type geo_id year age gender race
1391318 kc King County 2024 0-100 Female, Male White
27123 kc King County 2024 0-100 Female, Male AIAN
244489 kc King County 2024 0-100 Female, Male Multiple race
171318 kc King County 2024 0-100 Female, Male Black
24185 kc King County 2024 0-100 Female, Male NHPI
519667 kc King County 2024 0-100 Female, Male Asian

Complex arguments

King County regions stratified by year and gender

reg_yr_gen <- get_population(geo_type = "region",
                            years = 2017:2019, 
                            group_by = c("geo_id", "years", "genders"))
reg_yr_gen <- reg_yr_gen[, .(region = geo_id, year, gender, pop)]
print(setorder(reg_yr_gen, region, year, gender)[1:12])
region year gender pop
East 2017 Female 277919.83
East 2017 Male 277718.20
East 2018 Female 282789.50
East 2018 Male 282818.46
East 2019 Female 288267.60
East 2019 Male 288640.84
North 2017 Female 66011.00
North 2017 Male 63750.36
North 2018 Female 67053.90
North 2018 Male 64811.35
North 2019 Female 68616.25
North 2019 Male 66350.64

King County regions stratified by year -- Female Hispanic and Asian-NH residents aged 16-25 only -- not rounded

get_population(ages = 16:25, 
               genders = "F", 
               years = 2017:2019, 
               races = c("hispanic", "asian"), 
               geo_type = "region", 
               race_type = "race_eth", 
               group_by = c("geo_id", "years", "race_eth"), 
               round = F)[1:12]
pop geo_type geo_id year age gender race_eth geo_id_code
3301.5919 region East 2019 16-25 Female Hispanic 1
3132.0506 region East 2018 16-25 Female Hispanic 1
7696.0761 region South 2017 16-25 Female Asian 4
6089.2159 region East 2017 16-25 Female Asian 1
1142.0371 region North 2017 16-25 Female Asian 2
2967.4452 region East 2017 16-25 Female Hispanic 1
5515.6825 region Seattle 2017 16-25 Female Hispanic 3
12905.5073 region Seattle 2019 16-25 Female Asian 3
12301.1721 region Seattle 2018 16-25 Female Asian 3
1013.2724 region North 2019 16-25 Female Hispanic 2
7849.3768 region South 2018 16-25 Female Asian 4
967.1974 region North 2018 16-25 Female Hispanic 2

'hispanic' as a group_by value

Sometimes a user might want to access population data by Hispanic ethnicity. To get population values by race X ethnicity, users should include 'hispanic' in the group_by argument. This option only works in conjunction when race_type = 'race_eth'. Several combinations (e.g. adding 'hispanic' to the races argument) will not work and will throw some (hopefully) informative errors. Other options (as demonstrated above) will continue to work.

King County regions stratified by Hispanic/Non-Hispanic

# pull in data stratified by race/eth and region
  reg_hisp_nonhisp <- get_population(geo_type = 'region', 
                                     group_by = 'hispanic')
  
  # print select columns 
  reg_hisp_nonhisp <- reg_hisp_nonhisp[, .(region = geo_id, hispanic, pop)]
  print(setorder(reg_hisp_nonhisp, region, hispanic))
region hispanic pop
East Hispanic 49317.01
East Not Hispanic 565246.86
North Hispanic 15176.94
North Not Hispanic 128953.06
Seattle Hispanic 78327.60
Seattle Not Hispanic 719372.40
South Hispanic 130174.44
South Not Hispanic 691531.69

Return all race x Hispanic ethnicity combinations

race_x_eth <- get_population(race_type = 'race_eth', 
                             group_by = c('race_eth', 'hispanic'))
race_x_eth <- race_x_eth[, .(year, race_eth, hispanic, pop)]
print(setorder(race_x_eth, race_eth, hispanic))
year race_eth hispanic pop
2024 AIAN Hispanic 15839
2024 AIAN Not Hispanic 11284
2024 Asian Hispanic 6432
2024 Asian Not Hispanic 513235
2024 Black Hispanic 9734
2024 Black Not Hispanic 161584
2024 Multiple race Hispanic 78905
2024 Multiple race Not Hispanic 165584
2024 NHPI Hispanic 1707
2024 NHPI Not Hispanic 22478
2024 White Hispanic 160379
2024 White Not Hispanic 1230939

Return population of White residents by Hispanic ethnicity

race_x_eth <- get_population(race_type = 'race_eth', 
                             races = 'white', 
                             group_by = c('race_eth', 'hispanic'))
race_x_eth <- race_x_eth[, .(year, race_eth, hispanic, pop)]
print(setorder(race_x_eth, race_eth, hispanic))
year race_eth hispanic pop
2024 White Hispanic 160379
2024 White Not Hispanic 1230939

Using get_population with a pre-existing connection to hhsaw

Some users may not need/want to rely on get_population's auto-connection to HHSAW via keyring. Users can instead pass an existing database connection through the mykey argument. The example below still uses keyring (since most get_population users are on the PH domain), but it can be replaced by ActiveDirectoryIntegrated type authentications to HHSAW for the KC lucky ducks.

# Via autoconnect
r1 = get_population()

mycon <- DBI::dbConnect(
  odbc::odbc(), 
  driver = getOption("rads.odbc_version"), 
  server = "kcitazrhpasqlprp16.azds.kingcounty.gov", 
  database = "hhs_analytics_workspace",
  uid = keyring::key_list('hhsaw')[["username"]], 
  pwd = keyring::key_get('hhsaw', keyring::key_list('hhsaw')[["username"]]), 
  Encrypt = "yes", 
  TrustServerCertificate = "yes", 
  Authentication = "ActiveDirectoryPassword")

r2 = get_population(mykey = mycon)

print(all.equal(r1,r2))
## [1] TRUE

-- `Updated April 16, 2025 (rads v1.3.5)

Clone this wiki locally