-
Notifications
You must be signed in to change notification settings - Fork 1
get_population
⚠️ DEPRECATED: The functions in this vignette have been migrated to apde.data. Please use that package instead.
This vignette will provide some examples of ways to pull population data into R from the Azure cloud.
The population numbers are estimated by the WA Office of Financial Management (OFM) population unit. OFM produces two sets of estimates: (1) April 1 official population estimates for cities and towns and (2) Small Area Estimates (SAE) for smaller geographies. The get_population() function pulls the SAE numbers and, when round = T, should be the same as those in CHAT.
NOTE!! To get the most out of this vignette, we highly recommend that you actually type each and every bit of code into R. Doing so will almost definitely help you learn the syntax much faster than just reading the vignette or copying and pasting the code.
Arguments are the values that we send to a function when it is called. Generally, typing args(my_function_of_interest) will return the possible arguments including any defaults. For example,
args(get_population)## function (kingco = T, years = NA, ages = c(0:100), genders = c("f",
## "m"), races = c("aian", "asian", "black", "hispanic", "multiple",
## "nhpi", "white"), race_type = c("race_eth"), geo_type = c("kc"),
## group_by = NULL, round = FALSE, mykey = "hhsaw", census_vintage = 2020,
## geo_vintage = 2020, schema = "ref", table_prefix = "pop_geo_",
## return_query = FALSE)
## NULL
The standard arguments for get_population() are found in the its help file (?get_population), and summarized here for your convenience:
-
kingco<< Logical vector of length 1. Identifies whether you want population estimates limited to King County. Only impacts results for geo_type in c('blk', blkgrp', 'lgd', 'scd', 'tract', 'zip'). Default == TRUE. -
years<< Numeric vector. Identifies which year(s) of data should be pulled. Default == 2022. -
ages<< Numeric vector. Identifies which age(s) should be pulled. Default == c(0:100), with 100 being the top coded value for 100:120. -
genders<< Character vector of length 1 or 2. Identifies gender(s) should be pulled. The acceptable values are 'f', 'female', 'm', and 'male'. Default == c('f', 'm'). -
races<< Character vector of length 1 to 7. Identifies which race(s) or ethnicity should be pulled. The acceptable values are "aian", "asian", "black", "hispanic", "multiple", "nhpi", and "white". Default == all the possible values. -
race_type<< Character vector of length 1. Identifies whether to pull race data with Hispanic as an ethnicity ("race") or Hispanic as a race ("race_eth"). Default == c("race_eth"). -
geo_type<< Character vector of length 1. Identifies the geographic level for which you want population estimates. The acceptable values are: 'blk', 'blkgrp', 'county', 'hra', 'kc', 'lgd' (WA State legislative districts), 'region', 'seattle', 'scd' (school districts), 'tract', and 'zip'. Default == "kc". -
group_by<< Character vector of length 0 to 7. Identifies how you would like the data 'grouped' (i.e., stratified). Valid options are limited to: "years", "ages", "genders", "race", "race_eth", "fips_co", and "geo_id". Default == NULL, i.e., estimates are only grouped / aggregated by geography (e.g. geo_id is always included). -
round<< Logical vector of length 1. Identifies whether or not population estimates should be returned as whole numbers. Default == FALSE. -
mykey<< a character vector with the name of thekeyring::key that provides access to the Health and Human Services Analytic Workspace (HHSAW). If you have never set your keyring before and or do not know what this is referring to, just typekeyring::key_set('hhsaw', username = 'ALastname@kingcounty.gov')into your R console (making sure to replace the username). The default is 'hhsaw'. Note that it can also take the name of a live database connection. -
census_vintage << Either 2010 or 2020. Specifies the anchor census of the desired estimates. Default is 2020
-
geo_vintage << Either 2010 or 2020. Specifies the anchor census for geographies. For example, 2020 will return geographies based on 2020 blocks. Default is 2020
-
schema << Unless you are a power user, don't mess with this
-
table_prefix << Unless you are a power user, don't mess with this
-
return_query << logical. Rather than returning results, the query/queries used to fetch the results are provided
There is no need to specify any or all of the arguments listed above. As the following example shows, the default arguments for get_population provide the overall most recent year's estimated King County population.
get_population()[]| pop | geo_type | geo_id | year | age | gender | race_eth |
|---|---|---|---|---|---|---|
| 2378100 | kc | King County | 2024 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
Note 1: The use of head() below is not necessary. It is a convenience function that displays the first 6 rows of data and was used to keep the output in this vignette tidy.
Note 2: The use of [] after get_population() is used to print the output to the console. Typically, you would not print the results but would save them as an object. E.g., my.pop.est <- get_population().
WA
get_population(geo_type = 'wa',
round = TRUE)[]| pop | geo_type | geo_id | year | age | gender | race_eth | geo_id_code |
|---|---|---|---|---|---|---|---|
| 8035700 | wa | Washington State | 2024 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White | 53 |
King County
get_population(round = TRUE)[]| pop | geo_type | geo_id | year | age | gender | race_eth |
|---|---|---|---|---|---|---|
| 2378100 | kc | King County | 2024 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
King County Regions
get_population(geo_type = c("region"),
group_by = c("geo_id"),
round = TRUE)[]| pop | geo_type | geo_id | year | age | gender | race_eth | geo_id_code |
|---|---|---|---|---|---|---|---|
| 614564 | region | East | 2024 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White | 1 |
| 821706 | region | South | 2024 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White | 4 |
| 797700 | region | Seattle | 2024 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White | 3 |
| 144130 | region | North | 2024 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White | 2 |
King County Regions with round=FALSE
Turn off rounding to get the exact (fractional) number of people estimated.
rads::get_population(geo_type = 'region',
round = FALSE)[]| pop | geo_type | geo_id | year | age | gender | race_eth | geo_id_code |
|---|---|---|---|---|---|---|---|
| 614563.9 | region | East | 2024 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White | 1 |
| 821706.1 | region | South | 2024 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White | 4 |
| 797700.0 | region | Seattle | 2024 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White | 3 |
| 144130.0 | region | North | 2024 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White | 2 |
King County HRAs
head(get_population(geo_type = c("hra"),
group_by = c("geo_id"))[]) | pop | geo_type | geo_id | year | age | gender | race_eth | geo_id_code |
|---|---|---|---|---|---|---|---|
| 40640.25 | hra | Bellevue - Central | 2024 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White | 36 |
| 40896.51 | hra | Bellevue - Northeast | 2024 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White | 40 |
| 35054.98 | hra | Seattle - South Beacon Hill, Georgetown, and South Park | 2024 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White | 30 |
| 22000.00 | hra | Covington | 2024 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White | 11 |
| 61910.00 | hra | Shoreline | 2024 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White | 61 |
| 50566.25 | hra | Bear Creek and Greater Sammamish | 2024 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White | 54 |
King County Zip codes
head(get_population(geo_type = c("zip"),
group_by = c("geo_id"))[]) | pop | geo_type | geo_id | year | age | gender | race_eth |
|---|---|---|---|---|---|---|
| 24569.014 | zip | 98178 | 2024 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 29092.852 | zip | 98011 | 2024 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 26185.765 | zip | 98075 | 2024 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 44086.439 | zip | 98125 | 2024 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 28503.065 | zip | 98102 | 2024 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 3981.577 | zip | 98051 | 2024 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
King County Census Tracts
head(get_population(geo_type = c("tract"),
group_by = c("geo_id"),
ages = 18,
census_vintage = 2020,
geo_vintage = 2020)[]) | pop | geo_type | geo_id | year | age | gender | race_eth |
|---|---|---|---|---|---|---|
| 27.97442 | tract | 53033001300 | 2024 | 18 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 102.56269 | tract | 53033030005 | 2024 | 18 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 32.93070 | tract | 53033020401 | 2024 | 18 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 34.35188 | tract | 53033025806 | 2024 | 18 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 57.21224 | tract | 53033024001 | 2024 | 18 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 34.70830 | tract | 53033011500 | 2024 | 18 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
King County Census Block Groups
head(get_population(geo_type = c("blkgrp"),
group_by = c("geo_id"),
ages = 18,
census_vintage = 2020,
geo_vintage = 2020)[]) | pop | geo_type | geo_id | year | age | gender | race_eth |
|---|---|---|---|---|---|---|
| 31.445280 | blkgrp | 530330309024 | 2024 | 18 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 12.074794 | blkgrp | 530330106011 | 2024 | 18 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 22.764878 | blkgrp | 530330255003 | 2024 | 18 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 12.027419 | blkgrp | 530330252012 | 2024 | 18 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 3.713155 | blkgrp | 530330061004 | 2024 | 18 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 12.242667 | blkgrp | 530330227031 | 2024 | 18 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
King County Census Blocks
#ages added to make things go faster
head(get_population(geo_type = c("blk"),
group_by = c("geo_id"),
ages = 18,
census_vintage = 2020,
geo_vintage = 2020)[]) | pop | geo_type | geo_id | year | age | gender | race_eth |
|---|---|---|---|---|---|---|
| 0.6037123 | blk | 530330254023018 | 2024 | 18 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 5.1092879 | blk | 530330044021004 | 2024 | 18 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 0.6744307 | blk | 530330268012000 | 2024 | 18 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 1.4181219 | blk | 530330279011007 | 2024 | 18 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 2.7855590 | blk | 530330312025016 | 2024 | 18 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 0.1404601 | blk | 530330053051010 | 2024 | 18 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
King County multiple years combined
get_population(years = 2017:2019)[]| pop | geo_type | geo_id | year | age | gender | race_eth |
|---|---|---|---|---|---|---|
| 6565125 | kc | King County | 2017-2019 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
King County multiple years stratified
get_population(years = 2017:2019,
group_by = "years")[]| pop | geo_type | geo_id | year | age | gender | race_eth |
|---|---|---|---|---|---|---|
| 2149910 | kc | King County | 2017 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 2227755 | kc | King County | 2019 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 2187460 | kc | King County | 2018 | 0-100 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
King County multiple ages combined
get_population(ages = 65:70)[]| pop | geo_type | geo_id | year | age | gender | race_eth |
|---|---|---|---|---|---|---|
| 132119.3 | kc | King County | 2024 | 65-70 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
King County multiple ages stratified
get_population(ages = 65:70, group_by = "ages")[]| pop | geo_type | geo_id | year | age | gender | race_eth |
|---|---|---|---|---|---|---|
| 21677.85 | kc | King County | 2024 | 68 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 23299.48 | kc | King County | 2024 | 66 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 24062.47 | kc | King County | 2024 | 65 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 20725.29 | kc | King County | 2024 | 69 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 19775.33 | kc | King County | 2024 | 70 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 22578.88 | kc | King County | 2024 | 67 | Female, Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
King County female only
get_population(genders = "F")[]| pop | geo_type | geo_id | year | age | gender | race_eth |
|---|---|---|---|---|---|---|
| 1182773 | kc | King County | 2024 | 0-100 | Female | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
King County gender stratified
get_population(group_by = "genders")[]| pop | geo_type | geo_id | year | age | gender | race_eth |
|---|---|---|---|---|---|---|
| 1195327 | kc | King County | 2024 | 0-100 | Male | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
| 1182773 | kc | King County | 2024 | 0-100 | Female | AIAN, Asian, Black, Hispanic, Multiple race, NHPI, White |
King County AIAN (not Hispanic)
get_population(races = "aian",
race_type = "race_eth")[]| pop | geo_type | geo_id | year | age | gender | race_eth |
|---|---|---|---|---|---|---|
| 11284 | kc | King County | 2024 | 0-100 | Female, Male | AIAN |
King County AIAN (regardless of Hispanic ethnicity)
get_population(races = "aian",
race_type = "race",
group_by = 'race')[]| pop | geo_type | geo_id | year | age | gender | race |
|---|---|---|---|---|---|---|
| 27123 | kc | King County | 2024 | 0-100 | Female, Male | AIAN |
King County stratified by Hispanic as race
get_population(race_type = "race_eth",
group_by = "race_eth")[]| pop | geo_type | geo_id | year | age | gender | race_eth |
|---|---|---|---|---|---|---|
| 1230939 | kc | King County | 2024 | 0-100 | Female, Male | White |
| 11284 | kc | King County | 2024 | 0-100 | Female, Male | AIAN |
| 165584 | kc | King County | 2024 | 0-100 | Female, Male | Multiple race |
| 161584 | kc | King County | 2024 | 0-100 | Female, Male | Black |
| 22478 | kc | King County | 2024 | 0-100 | Female, Male | NHPI |
| 272996 | kc | King County | 2024 | 0-100 | Female, Male | Hispanic |
| 513235 | kc | King County | 2024 | 0-100 | Female, Male | Asian |
King County stratified by race (Hispanic as ethnicity)
get_population(race_type = "race",
group_by = "race")[]| pop | geo_type | geo_id | year | age | gender | race |
|---|---|---|---|---|---|---|
| 1391318 | kc | King County | 2024 | 0-100 | Female, Male | White |
| 27123 | kc | King County | 2024 | 0-100 | Female, Male | AIAN |
| 244489 | kc | King County | 2024 | 0-100 | Female, Male | Multiple race |
| 171318 | kc | King County | 2024 | 0-100 | Female, Male | Black |
| 24185 | kc | King County | 2024 | 0-100 | Female, Male | NHPI |
| 519667 | kc | King County | 2024 | 0-100 | Female, Male | Asian |
King County regions stratified by year and gender
reg_yr_gen <- get_population(geo_type = "region",
years = 2017:2019,
group_by = c("geo_id", "years", "genders"))
reg_yr_gen <- reg_yr_gen[, .(region = geo_id, year, gender, pop)]
print(setorder(reg_yr_gen, region, year, gender)[1:12])| region | year | gender | pop |
|---|---|---|---|
| East | 2017 | Female | 277919.83 |
| East | 2017 | Male | 277718.20 |
| East | 2018 | Female | 282789.50 |
| East | 2018 | Male | 282818.46 |
| East | 2019 | Female | 288267.60 |
| East | 2019 | Male | 288640.84 |
| North | 2017 | Female | 66011.00 |
| North | 2017 | Male | 63750.36 |
| North | 2018 | Female | 67053.90 |
| North | 2018 | Male | 64811.35 |
| North | 2019 | Female | 68616.25 |
| North | 2019 | Male | 66350.64 |
King County regions stratified by year -- Female Hispanic and Asian-NH residents aged 16-25 only -- not rounded
get_population(ages = 16:25,
genders = "F",
years = 2017:2019,
races = c("hispanic", "asian"),
geo_type = "region",
race_type = "race_eth",
group_by = c("geo_id", "years", "race_eth"),
round = F)[1:12]| pop | geo_type | geo_id | year | age | gender | race_eth | geo_id_code |
|---|---|---|---|---|---|---|---|
| 3301.5919 | region | East | 2019 | 16-25 | Female | Hispanic | 1 |
| 3132.0506 | region | East | 2018 | 16-25 | Female | Hispanic | 1 |
| 7696.0761 | region | South | 2017 | 16-25 | Female | Asian | 4 |
| 6089.2159 | region | East | 2017 | 16-25 | Female | Asian | 1 |
| 1142.0371 | region | North | 2017 | 16-25 | Female | Asian | 2 |
| 2967.4452 | region | East | 2017 | 16-25 | Female | Hispanic | 1 |
| 5515.6825 | region | Seattle | 2017 | 16-25 | Female | Hispanic | 3 |
| 12905.5073 | region | Seattle | 2019 | 16-25 | Female | Asian | 3 |
| 12301.1721 | region | Seattle | 2018 | 16-25 | Female | Asian | 3 |
| 1013.2724 | region | North | 2019 | 16-25 | Female | Hispanic | 2 |
| 7849.3768 | region | South | 2018 | 16-25 | Female | Asian | 4 |
| 967.1974 | region | North | 2018 | 16-25 | Female | Hispanic | 2 |
Sometimes a user might want to access population data by Hispanic ethnicity. To get population values by race X ethnicity, users should include 'hispanic' in the group_by argument. This option only works in conjunction when race_type = 'race_eth'. Several combinations (e.g. adding 'hispanic' to the races argument) will not work and will throw some (hopefully) informative errors. Other options (as demonstrated above) will continue to work.
King County regions stratified by Hispanic/Non-Hispanic
# pull in data stratified by race/eth and region
reg_hisp_nonhisp <- get_population(geo_type = 'region',
group_by = 'hispanic')
# print select columns
reg_hisp_nonhisp <- reg_hisp_nonhisp[, .(region = geo_id, hispanic, pop)]
print(setorder(reg_hisp_nonhisp, region, hispanic))| region | hispanic | pop |
|---|---|---|
| East | Hispanic | 49317.01 |
| East | Not Hispanic | 565246.86 |
| North | Hispanic | 15176.94 |
| North | Not Hispanic | 128953.06 |
| Seattle | Hispanic | 78327.60 |
| Seattle | Not Hispanic | 719372.40 |
| South | Hispanic | 130174.44 |
| South | Not Hispanic | 691531.69 |
Return all race x Hispanic ethnicity combinations
race_x_eth <- get_population(race_type = 'race_eth',
group_by = c('race_eth', 'hispanic'))
race_x_eth <- race_x_eth[, .(year, race_eth, hispanic, pop)]
print(setorder(race_x_eth, race_eth, hispanic))| year | race_eth | hispanic | pop |
|---|---|---|---|
| 2024 | AIAN | Hispanic | 15839 |
| 2024 | AIAN | Not Hispanic | 11284 |
| 2024 | Asian | Hispanic | 6432 |
| 2024 | Asian | Not Hispanic | 513235 |
| 2024 | Black | Hispanic | 9734 |
| 2024 | Black | Not Hispanic | 161584 |
| 2024 | Multiple race | Hispanic | 78905 |
| 2024 | Multiple race | Not Hispanic | 165584 |
| 2024 | NHPI | Hispanic | 1707 |
| 2024 | NHPI | Not Hispanic | 22478 |
| 2024 | White | Hispanic | 160379 |
| 2024 | White | Not Hispanic | 1230939 |
Return population of White residents by Hispanic ethnicity
race_x_eth <- get_population(race_type = 'race_eth',
races = 'white',
group_by = c('race_eth', 'hispanic'))
race_x_eth <- race_x_eth[, .(year, race_eth, hispanic, pop)]
print(setorder(race_x_eth, race_eth, hispanic))| year | race_eth | hispanic | pop |
|---|---|---|---|
| 2024 | White | Hispanic | 160379 |
| 2024 | White | Not Hispanic | 1230939 |
Some users may not need/want to rely on get_population's auto-connection to HHSAW via keyring. Users can instead pass an existing database connection through the mykey argument. The example below still uses keyring (since most get_population users are on the PH domain), but it can be replaced by ActiveDirectoryIntegrated type authentications to HHSAW for the KC lucky ducks.
# Via autoconnect
r1 = get_population()
mycon <- DBI::dbConnect(
odbc::odbc(),
driver = getOption("rads.odbc_version"),
server = "kcitazrhpasqlprp16.azds.kingcounty.gov",
database = "hhs_analytics_workspace",
uid = keyring::key_list('hhsaw')[["username"]],
pwd = keyring::key_get('hhsaw', keyring::key_list('hhsaw')[["username"]]),
Encrypt = "yes",
TrustServerCertificate = "yes",
Authentication = "ActiveDirectoryPassword")
r2 = get_population(mykey = mycon)
print(all.equal(r1,r2))## [1] TRUE
-- `Updated April 16, 2025 (rads v1.3.5)