-
Notifications
You must be signed in to change notification settings - Fork 1
chars_functions
The rads package has a suite of
tools designed to facilitate and accelerate the analysis of standardized
CHARS (Comprehensive Hospital Abstract Reporting System) data. Combining
the rads functions below with the clean CHARS data (or with practice
data from rads.data, which is installed automatically with rads)
should allow analysts to conduct custom analyses with relative ease. The
core rads CHARS functions are:
-
chars_icd_ccs(): view available CHARS ICD-9-CM and ICD-10-CM descriptions as well as ‘superlevel’, ‘broad’, ‘midlevel’, and ‘detailed’ aggregations derived from AHRQ’s HCUP CCSR that can be used withchars_icd_ccs_count() -
chars_icd_ccs_count(): generate counts of CHARS hospitalizations using ICD-9-CM or ICD-10-CM descriptions or ‘superlevel’, ‘broad’, ‘midlevel’, and ‘detailed’ categories. -
chars_injury_matrix(): view all available intents and mechanisms that can be used withchars_injury_matrix_count()(2012+) -
chars_injury_matrix_count(): generate counts of injury related hospitalizations by intent and mechanism (2012+) -
chars_validate_data(): validate that your CHARS dataset has the proper structure and columns needed for the injury and ICD-CM analysis functions
Additionally, if you have APDE credentials and are working within King
County’s infrastructure, you can easily download standardized CHARS data
from SQL into R (2012+) using apde.data::chars().
All of these functions have detailed help files that are accessible by
typing ?function_name, e.g. ?chars_injury_matrix_count. Some
examples for how to use these functions are given below.
A few quick notes before we begin …
-
apde.data::chars()can provide you with ICD-9-CM data (2012-2015) as well as ICD-10-CM data (2016+). -
chars_injury_matrix()andchars_injury_matrix_count()are agnostic as to whether the underlying data are ICD-9-CM or ICD-10-CM. -
chars_icd_ccs()&chars_icd_ccs_count()need you to specify which ICD-CM version you have in your data. This means you can analyze 2012-2015 data or 2016+ data, but not both at the same time in a single command. - If you want to create age-adjusted rates, we recommend you read the age_standardize and calculating_rates_with_rads vignettes after working through this one.
rm(list=ls())
library(rads)
library(data.table)If you’re learning to work with CHARS data or developing new analysis
code, we provide a synthetic dataset that mimics the structure of real
CHARS data. This privacy-safe dataset is automatically available when
you install rads and contains over 250,000 rows of injury-related
diagnosis data.
# Load the synthetic CHARS data
data(synthetic_chars, package = "rads.data")
# View the structure
str(synthetic_chars)Classes 'data.table' and 'data.frame': 288326 obs. of 9 variables:
$ seq_no : int 1 2 3 4 5 6 7 8 9 10 ...
$ diag1 : chr "K8051" "Z3801" "K5651" "N136" ...
$ injury_nature_broad : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ injury_nature_narrow: logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ injury_intent : chr NA NA NA NA ...
$ injury_mechanism : chr NA NA NA NA ...
$ chi_geo_kc : chr "King County" "King County" NA NA ...
$ temperament : chr "Calm" "Moderate" "Calm" "Moderate" ...
$ creation_date : Date, format: "2026-01-07" "2026-01-07" ...
- attr(*, ".internal.selfref")=<externalptr>
The synthetic dataset includes the key columns needed for injury and ICD-CM analyses:
-
seq_no: unique patient-visit identifier -
diag1: ICD-10-CM primary diagnosis code -
injury_nature_broad&injury_nature_narrow: designation of the type of injury definition used. ‘narrow’ follows the CDC recommendation and ‘broad’ follows a more expansive definition. see?rads.data::synthetic_charsfor details -
injury_intent: injury intent classification (e.g., ‘assault’, ‘unintentional’) -
injury_mechanism: injury mechanism classification (e.g., ‘fall’, ‘firearm’) -
chi_geo_kc: King County indicator -
temperament: a categorical ‘demographic’ indicator for practice stratification
This synthetic dataset is perfect for testing code before running analyses on real data.
Note: This section is only relevant for analysts working within King County’s APDE infrastructure who have the necessary credentials.
apde.data::chars() takes nine potential arguments:
-
cols: character. the names of the columns that you want to download. Limiting this list to only the variables you truly need can significantly improve download speed. In most cases, this includes any demographic stratifiers of interest along with “seq_no”, “diag1”, “injury_nature_broad”, “injury_nature_narrow”, “injury_intent”, and “injury_mechanism”. -
year: the year(s) of interest, from 2012 to the present. -
kingco: logical (T|F) . True or false specifies whether to limit the download to King County, based on truncated ZIP codes (980## and 981##). -
version: character. Either'final'or'stage'. -
wastate: logical (T|F). When false, data will include Oregon. -
inpatient: logical (T|F). When false, data will include observation patients (i.e., outpatients). -
deaths: logical (T|F). When true, the data will include those who died while in the hospital. -
topcode: logical (T|F). When true,chi_agewill be top coded to 100 to match population data top coding.
If you do not specify any of the arguments, you will get all CHARS data columns, for the latest year, for King County (defined by truncated ZIP codes), limited to inpatients, including those who died while hospitalized, with ages top coded to 100.
charsDT <- apde.data::chars(cols = c("seq_no", "diag1", "injury_nature_broad",
"injury_nature_narrow", "injury_intent",
"injury_mechanism", "chi_geo_kc", "chi_year",
"chi_age"), year = 2023)
unique(charsDT$chi_geo_kc) # confirm data is limited to King County[1] "King County"
unique(charsDT$chi_year) # check the year[1] 2023
max(charsDT$chi_age, na.rm = T) # check top coding[1] 100
For the remainder of this vignette, we’ll use the synthetic dataset so that everyone can follow along:
charsDT <- rads.data::synthetic_charsImportant note: Since King County in CHARS data is defined by
truncated ZIP codes (980## and 981##), the correct denominator when
calculating rates should be defined the same way. To be clear, this
means you should NOT use apde.data::population(kingco = T) for
King County CHARS denominators. Instead, you should get ZIP code
population data and aggregate it for King County.
Here’s an example of how to obtain a King County population denominator by age, gender, and race (with Hispanic as a race):
# Get ZIP code level population data
denominator <- apde.data::population(kingco = FALSE,
geo_type = 'zip',
group_by = c('ages', 'genders', 'race_eth'))
# Subset to ZIP that begin with 980/981
denominator <- denominator[grepl('^980|^981', geo_id)]
# Sum the population across all these ZIP codes by gender, race/eth, and age
denominator <- denominator[, .(pop = sum(pop)), .(gender, race_eth, age)]
# Label it as King County
denominator[, geo_id := 'King County']Before diving into analysis, it’s good practice to validate that your
CHARS dataset has the proper structure. The chars_validate_data()
function checks that your data contains all the required columns with
appropriate data types and values.
validated_chars <- chars_validate_data(ph.data = charsDT,
icdcol = 'diag1',
icdcm_version = 10)This function validates:
- Required columns exist:
seq_no,injury_nature_broad,injury_nature_narrow,injury_intent,injury_mechanism, and the ICD column -
seq_nocontains unique values (one per patient-visit) - Injury columns have appropriate data types (logical for nature columns, character for intent/mechanism)
- ICD codes are properly formatted
- Standard injury intent and mechanism categories are present (with informative messages if any are missing)
The function returns the validated data (with any necessary cleaning applied to ICD codes) and is especially useful when working with custom or external CHARS datasets.
chars_icd_ccs() takes three arguments:
-
ref_typ: specifies the hospital diagnosis descriptions that are of interest to you. Acceptable options include: ‘all’, ‘icdcm’, ‘superlevel’, ‘broad’, ‘midlevel’, & ‘detailed’. -
icdcm_version: specifies the ICD-CM version that you want to reference. Acceptable options include: 9 & 10, with 10 being the default.
Do not attempt to manually browse through chars_icd_ccs() … you will
lose your mind because it has more than 100,000 rows! Rather, use it to
identify the type of non-injury hospitalization of interest. The
structure is simple and (hopefully!) self-explanatory. Let’s take a look
at the first three rows as an example by typing chars_icd_ccs()[1:3]:
| icdcm_code | icdcm | superlevel | broad | midlevel | detailed | icdcm_version |
|---|---|---|---|---|---|---|
| A00 | Cholera | Infectious diseases | Diseases of the digestive system | Intestinal infection | Intestinal infection | 10 |
| A000 | Cholera due to Vibrio cholerae 01, biovar cholerae | Infectious diseases | Diseases of the digestive system | Intestinal infection | Intestinal infection | 10 |
| A001 | Cholera due to Vibrio cholerae 01, biovar eltor | Infectious diseases | Diseases of the digestive system | Intestinal infection | Intestinal infection | 10 |
Teaching about regular expression, a.k.a.
regex
and filtering is outside the bounds of this vignette. However, I imagine
you will usually want to use aggregated hospitalization data so I
encourage you to look at the unique values of superlevel, broad,
midlevel, and detailed data. For example, let’s examine the CCSR broad
categories with chars_icd_ccs(ref_type = 'broad'):
| broad | icdcm_version |
|---|---|
| Diseases of the digestive system | 10 |
| Certain infectious and parasitic diseases | 10 |
| Diseases of the genitourinary system | 10 |
| Diseases of the eye and adnexa | 10 |
| Diseases of the ear and mastoid process | 10 |
| Endocrine, nutritional and metabolic diseases | 10 |
| Diseases of the circulatory system | 10 |
| Diseases of the blood and blood-forming organs and certain disorders involving the immune mechanism | 10 |
| Dental diseases | 10 |
| Neoplasms | 10 |
| NA | 10 |
| Diseases of the nervous system | 10 |
| Mental, behavioral and neurodevelopmental disorders | 10 |
| Factors influencing health status and contact with health services | 10 |
| Injury, poisoning and certain other consequences of external causes | 10 |
| Diseases of the musculoskeletal system and connective tissue | 10 |
| Diseases of the respiratory system | 10 |
| Diseases of the skin and subcutaneous tissue | 10 |
| Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified | 10 |
| Pregnancy, childbirth and the puerperium | 10 |
| Certain conditions originating in the perinatal period | 10 |
| Congenital malformations, deformations and chromosomal abnormalities | 10 |
| External causes of morbidity | 10 |
chars_icd_ccs_count() allows the user to get CHARS counts by ICD-CM
code, ICD-cm description, or the superlevel, broad, midlevel, and
detailed categories. I provide examples of each of these below, in order
of decreasing granularity / specificity using hypertensive heart disease
as a case study.
| icdcm_code | icdcm | superlevel | broad | midlevel | detailed | icdcm_version |
|---|---|---|---|---|---|---|
| I110 | Hypertensive heart disease with heart failure | Chronic diseases | Diseases of the circulatory system | Hypertension | Hypertension | 10 |
However, before we begin, let’s review the possible arguments used by
chars_icd_ccs_count():
-
ph.data: the name of a person level data.table/data.frame of CHARS data with ICD10-cm codes -
icdcm_version: specifies the ICD-CM version that you want to reference. Acceptable options include: 9 & 10, with 10 being the default. -
icdcm: the ICD-CM code of interest OR its description. It is case insensivitive and partial strings are allowed. -
superlevel: ‘superlevel’ level descriptions that are of interest. Case insensivitive and partial strings are allowed. -
broad: CCSR derived ‘broad’ level descriptions that are of interest. Case insensivitive and partial strings are allowed. -
midlevel: ’midlevel level descriptions that are of interest. Case insensivitive and partial strings are allowed. -
detailed: CCSR derived ‘detailed’ level descriptions that are of interest. Case insensivitive and partial strings are allowed. -
icdcol: the name of the column inph.datathat contains the ICD10-cm codes. Default isdiag1, which is provided when you useapde.data::chars(). -
group_by: identifies the variables by which you want to group (a.k.a., stratify) the results. -
kingco: logical (T|F) specifying whether to limit the data analysis to King County. Only works if ph.data still has thechi_geo_kccolumn.
mycode <- chars_icd_ccs_count(ph.data = charsDT,
icdcm = 'I110')| icdcm_desc | hospitalizations |
|---|---|
| Hypertensive heart disease with heart failure | 1,886 |
mydesc <- chars_icd_ccs_count(ph.data = charsDT,
icdcm = 'Hypertensive heart disease with heart failure')| icdcm_desc | hospitalizations |
|---|---|
| Hypertensive heart disease with heart failure | 1,886 |
These results are identical to searching by code because, for hypertensive heart failure, there is a one-to-one match of description to ICD-10-CM code.
mydetailed <- chars_icd_ccs_count(ph.data = charsDT,
detailed = '^hypertension$')| detailed_desc | hospitalizations |
|---|---|
| Hypertension | 5,984 |
mymidlevel <- chars_icd_ccs_count(ph.data = charsDT,
midlevel = '^Hypertension$')| midlevel_desc | hospitalizations |
|---|---|
| Hypertension | 6,261 |
mybroad <- chars_icd_ccs_count(ph.data = charsDT,
broad = 'Diseases of the circulatory system')| broad_desc | hospitalizations |
|---|---|
| Diseases of the circulatory system | 22,379 |
mysuperlevel <- chars_icd_ccs_count(ph.data = charsDT,
superlevel = 'Chronic diseases')| superlevel_desc | hospitalizations |
|---|---|
| Chronic diseases | 56,708 |
The chars_injury_matrix() function provides a handy reference of all
the mechanism and intent combinations that can be used with
chars_injury_matrix_count(). Here are the first 10 rows:
| mechanism | intent |
|---|---|
| any | any |
| any | assault |
| any | intentional |
| any | legal |
| any | undetermined |
| any | unintentional |
| bites_stings | any |
| bites_stings | assault |
| bites_stings | intentional |
| bites_stings | legal |
If you just want to see a list of the available intents, type
unique(chars_injury_matrix()[]$intent):
[1] "any" "assault" "intentional" "legal"
[5] "undetermined" "unintentional"
Similarly, to see the available mechanisms, type
unique(chars_injury_matrix()[]$mechanism):
[1] "any" "bites_stings"
[3] "cut_pierce" "drowning"
[5] "fall" "fire_burn"
[7] "firearm" "machinery"
[9] "motor_vehicle_nontraffic" "motor_vehicle_traffic"
[11] "mvt_motorcyclist" "mvt_occupant"
[13] "mvt_other" "mvt_pedal_cyclist"
[15] "mvt_pedestrian" "mvt_unspecified"
[17] "natural_environmental" "other_land_transport"
[19] "other_specified" "other_transport"
[21] "overexertion" "pedal_cyclist"
[23] "pedestrian" "poisoning"
[25] "poisoning_drug" "poisoning_nondrug"
[27] "struck_by_against" "suffocation"
[29] "unspecified"
The chars_injury_matrix_count() function is similar to the
chars_icd_ccs_count() function above, except that it counts injury
related hospitalizations. chars_injury_matrix_count() takes seven
potential arguments:
-
ph.data: the name of a person level data.table/data.frame of CHARS data downloaded withapde.data::chars()or structured likerads.data::synthetic_chars. Note that the intents and mechanisms are pre-calculated so you will need to ensureph.datahas the relevantinjury_mechanismandinjury_intentcolumns. The easiest way to do this with real data is to haveapde.data::chars()download all the columns. -
intent: the injury intent of interest. Partial strings are allowed. Use'none'or'any'to ignore intent and return “Any intent”. Use'*'(the default wildcard) to return all possible intents. -
mechanism: the injury mechanism of interest. Partial strings are allowed. Use'none'or'any'to ignore mechanism and return “Any mechanism”. Use'*'(the default wildcard) to return all possible mechanisms. -
group_by: identifies the variables by which you want to group (a.k.a., stratify) the results. -
def: acceptable values are ‘narrow’ or ‘broad’. It specifies whether you want to use the CDC’s recommended ‘narrow’ approach, which requires that the principal diagnosis of an injury hospitalization be a nature-of-injury ICD-10-CM code. Or, alternatively, the ‘broad’ definition that searches all available diagnosis fields on the hospital discharge record. See this document for details. -
primary_ecode: logical (T|F) specifying whether to limit the analysis to using just the primary ecode (i.e., theinjury_ecodevariable), rather than all available ecodes. The vast majority of the time you will want to keep the default setting. -
kingco: logical (T|F) specifying whether to limit the data analysis to King County. Only works if ph.data still has thechi_geo_kccolumn.
mat1 <- chars_injury_matrix_count(ph.data = charsDT,
intent = 'assault',
mechanism = 'none')| mechanism | intent | hospitalizations |
|---|---|---|
| Any mechanism | assault | 166 |
mat2 <- chars_injury_matrix_count(ph.data = charsDT,
intent = 'assault|undetermined',
mechanism = 'none')| mechanism | intent | hospitalizations |
|---|---|---|
| Any mechanism | assault | 166 |
| Any mechanism | undetermined | 50 |
Note that you can also specify more than one intent or mechanism using a vector with separated values.
mat2.alt <- chars_injury_matrix_count(ph.data = charsDT,
intent = c('assault', 'undetermined'),
mechanism = 'none') mat3 <- chars_injury_matrix_count(ph.data = charsDT,
intent = 'none',
mechanism = 'motor_vehicle_traffic')| mechanism | intent | hospitalizations |
|---|---|---|
| motor_vehicle_traffic | Any intent | 781 |
You get hospitalizations due to any injury.
mat4 <- chars_injury_matrix_count(ph.data = charsDT,
intent = 'none',
mechanism = 'none')| mechanism | intent | hospitalizations |
|---|---|---|
| Any mechanism | Any intent | 7,174 |
You get every possible combination of mechanism and intent. Let’s look at just the top 10 for convenience.
mat5 <- chars_injury_matrix_count(ph.data = charsDT)[1:10]| mechanism | intent | hospitalizations |
|---|---|---|
| Any mechanism | Any intent | 7,174 |
| Any mechanism | assault | 166 |
| Any mechanism | intentional | 519 |
| Any mechanism | legal | 7 |
| Any mechanism | undetermined | 50 |
| Any mechanism | unintentional | 6,432 |
| bites_stings | Any intent | 33 |
| bites_stings | assault | 0 |
| bites_stings | intentional | 0 |
| bites_stings | legal | 0 |
mat6 <- chars_injury_matrix_count(ph.data = charsDT,
intent = 'none',
mechanism = 'none',
def = 'narrow')
mat7 <- chars_injury_matrix_count(ph.data = charsDT,
intent = 'none',
mechanism = 'none',
def = 'broad')
deftable <- rbind(cbind(def = 'narrow', mat6),
cbind(def = 'broad', mat7))| def | mechanism | intent | hospitalizations |
|---|---|---|---|
| narrow | Any mechanism | Any intent | 7,174 |
| broad | Any mechanism | Any intent | 11,392 |
These tables show that there is a huge difference in the number of
hospitalizations, dependent upon the definition that you use. Unless you
have a specific rationale for changing it, please use the default in
your analyses (i.e., def = 'narrow').
We know this was a lot to process. The good news is that this vignette isn’t going anywhere. If you remember (a) that this vignette exists and (b) where to find it, you’ll be in good shape to take on standard CHARS analyses in the future.
If you’ve read through this vignette and the corresponding help files and are still confused, please feel free to reach out for assistance. You may have found a bug, who knows? Good luck!
– Updated January 20, 2026 (rads v1.5.3)