chars_functions

CHARS Functions

Introduction

The rads package has a suite of tools designed to facilitate and accelerate the analysis of standardized CHARS (Comprehensive Hospital Abstract Reporting System) data. Combining the rads functions below with the clean CHARS data (or with practice data from rads.data, which is installed automatically with rads) should allow analysts to conduct custom analyses with relative ease. The core rads CHARS functions are:

chars_icd_ccs(): view available CHARS ICD-9-CM and ICD-10-CM descriptions as well as ‘superlevel’, ‘broad’, ‘midlevel’, and ‘detailed’ aggregations derived from AHRQ’s HCUP CCSR that can be used with chars_icd_ccs_count()
chars_icd_ccs_count(): generate counts of CHARS hospitalizations using ICD-9-CM or ICD-10-CM descriptions or ‘superlevel’, ‘broad’, ‘midlevel’, and ‘detailed’ categories.
chars_injury_matrix(): view all available intents and mechanisms that can be used with chars_injury_matrix_count() (2012+)
chars_injury_matrix_count(): generate counts of injury related hospitalizations by intent and mechanism (2012+)
chars_validate_data(): validate that your CHARS dataset has the proper structure and columns needed for the injury and ICD-CM analysis functions

Additionally, if you have APDE credentials and are working within King County’s infrastructure, you can easily download standardized CHARS data from SQL into R (2012+) using apde.data::chars().

All of these functions have detailed help files that are accessible by typing ?function_name, e.g. ?chars_injury_matrix_count. Some examples for how to use these functions are given below.

A few quick notes before we begin …

apde.data::chars() can provide you with ICD-9-CM data (2012-2015) as well as ICD-10-CM data (2016+).
chars_injury_matrix() and chars_injury_matrix_count() are agnostic as to whether the underlying data are ICD-9-CM or ICD-10-CM.
chars_icd_ccs() & chars_icd_ccs_count() need you to specify which ICD-CM version you have in your data. This means you can analyze 2012-2015 data or 2016+ data, but not both at the same time in a single command.
If you want to create age-adjusted rates, we recommend you read the age_standardize and calculating_rates_with_rads vignettes after working through this one.

Set up the environment

rm(list=ls())
library(rads)
library(data.table)

Getting CHARS data

For practice and development: `rads.data::synthetic_chars`

If you’re learning to work with CHARS data or developing new analysis code, we provide a synthetic dataset that mimics the structure of real CHARS data. This privacy-safe dataset is automatically available when you install rads and contains over 250,000 rows of injury-related diagnosis data.

# Load the synthetic CHARS data
data(synthetic_chars, package = "rads.data")

# View the structure
str(synthetic_chars)

Classes 'data.table' and 'data.frame':  288326 obs. of  9 variables:
 $ seq_no              : int  1 2 3 4 5 6 7 8 9 10 ...
 $ diag1               : chr  "K8051" "Z3801" "K5651" "N136" ...
 $ injury_nature_broad : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ injury_nature_narrow: logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ injury_intent       : chr  NA NA NA NA ...
 $ injury_mechanism    : chr  NA NA NA NA ...
 $ chi_geo_kc          : chr  "King County" "King County" NA NA ...
 $ temperament         : chr  "Calm" "Moderate" "Calm" "Moderate" ...
 $ creation_date       : Date, format: "2026-01-07" "2026-01-07" ...
 - attr(*, ".internal.selfref")=<externalptr>

The synthetic dataset includes the key columns needed for injury and ICD-CM analyses:

seq_no: unique patient-visit identifier
diag1: ICD-10-CM primary diagnosis code
injury_nature_broad & injury_nature_narrow: designation of the type of injury definition used. ‘narrow’ follows the CDC recommendation and ‘broad’ follows a more expansive definition. see ?rads.data::synthetic_chars for details
injury_intent: injury intent classification (e.g., ‘assault’, ‘unintentional’)
injury_mechanism: injury mechanism classification (e.g., ‘fall’, ‘firearm’)
chi_geo_kc: King County indicator
temperament: a categorical ‘demographic’ indicator for practice stratification

This synthetic dataset is perfect for testing code before running analyses on real data.

For APDE analysts: `apde.data::chars()`

Note: This section is only relevant for analysts working within King County’s APDE infrastructure who have the necessary credentials.

apde.data::chars() takes nine potential arguments:

cols: character. the names of the columns that you want to download. Limiting this list to only the variables you truly need can significantly improve download speed. In most cases, this includes any demographic stratifiers of interest along with “seq_no”, “diag1”, “injury_nature_broad”, “injury_nature_narrow”, “injury_intent”, and “injury_mechanism”.
year: the year(s) of interest, from 2012 to the present.
kingco: logical (T|F) . True or false specifies whether to limit the download to King County, based on truncated ZIP codes (980## and 981##).
version: character. Either 'final' or 'stage'.
wastate: logical (T|F). When false, data will include Oregon.
inpatient: logical (T|F). When false, data will include observation patients (i.e., outpatients).
deaths: logical (T|F). When true, the data will include those who died while in the hospital.
topcode: logical (T|F). When true, chi_age will be top coded to 100 to match population data top coding.

If you do not specify any of the arguments, you will get all CHARS data columns, for the latest year, for King County (defined by truncated ZIP codes), limited to inpatients, including those who died while hospitalized, with ages top coded to 100.

charsDT <- apde.data::chars(cols = c("seq_no", "diag1", "injury_nature_broad", 
                                     "injury_nature_narrow", "injury_intent", 
                                     "injury_mechanism", "chi_geo_kc", "chi_year", 
                                     "chi_age"), year = 2023)
unique(charsDT$chi_geo_kc) # confirm data is limited to King County

[1] "King County"

unique(charsDT$chi_year) # check the year

[1] 2023

max(charsDT$chi_age, na.rm = T) # check top coding

[1] 100

For the remainder of this vignette, we’ll use the synthetic dataset so that everyone can follow along:

charsDT <- rads.data::synthetic_chars

⚠️ A note about King County population denominators

Important note: Since King County in CHARS data is defined by truncated ZIP codes (980## and 981##), the correct denominator when calculating rates should be defined the same way. To be clear, this means you should NOT use apde.data::population(kingco = T) for King County CHARS denominators. Instead, you should get ZIP code population data and aggregate it for King County.

Here’s an example of how to obtain a King County population denominator by age, gender, and race (with Hispanic as a race):

# Get ZIP code level population data
denominator <- apde.data::population(kingco = FALSE, 
                             geo_type = 'zip', 
                             group_by = c('ages', 'genders', 'race_eth'))

# Subset to ZIP that begin with 980/981
denominator <- denominator[grepl('^980|^981', geo_id)]

# Sum the population across all these ZIP codes by gender, race/eth, and age
denominator <- denominator[, .(pop = sum(pop)), .(gender, race_eth, age)]

# Label it as King County
denominator[, geo_id := 'King County']

chars_validate_data()

Before diving into analysis, it’s good practice to validate that your CHARS dataset has the proper structure. The chars_validate_data() function checks that your data contains all the required columns with appropriate data types and values.

validated_chars <- chars_validate_data(ph.data = charsDT, 
                                       icdcol = 'diag1',
                                       icdcm_version = 10)

This function validates:

Required columns exist: seq_no, injury_nature_broad, injury_nature_narrow, injury_intent, injury_mechanism, and the ICD column
seq_no contains unique values (one per patient-visit)
Injury columns have appropriate data types (logical for nature columns, character for intent/mechanism)
ICD codes are properly formatted
Standard injury intent and mechanism categories are present (with informative messages if any are missing)

The function returns the validated data (with any necessary cleaning applied to ICD codes) and is especially useful when working with custom or external CHARS datasets.

chars_icd_ccs()

chars_icd_ccs() takes three arguments:

ref_typ: specifies the hospital diagnosis descriptions that are of interest to you. Acceptable options include: ‘all’, ‘icdcm’, ‘superlevel’, ‘broad’, ‘midlevel’, & ‘detailed’.
icdcm_version: specifies the ICD-CM version that you want to reference. Acceptable options include: 9 & 10, with 10 being the default.

Do not attempt to manually browse through chars_icd_ccs() … you will lose your mind because it has more than 100,000 rows! Rather, use it to identify the type of non-injury hospitalization of interest. The structure is simple and (hopefully!) self-explanatory. Let’s take a look at the first three rows as an example by typing chars_icd_ccs()[1:3]:

icdcm_code	icdcm	superlevel	broad	midlevel	detailed	icdcm_version
A00	Cholera	Infectious diseases	Diseases of the digestive system	Intestinal infection	Intestinal infection	10
A000	Cholera due to Vibrio cholerae 01, biovar cholerae	Infectious diseases	Diseases of the digestive system	Intestinal infection	Intestinal infection	10
A001	Cholera due to Vibrio cholerae 01, biovar eltor	Infectious diseases	Diseases of the digestive system	Intestinal infection	Intestinal infection	10

Teaching about regular expression, a.k.a. regex and filtering is outside the bounds of this vignette. However, I imagine you will usually want to use aggregated hospitalization data so I encourage you to look at the unique values of superlevel, broad, midlevel, and detailed data. For example, let’s examine the CCSR broad categories with chars_icd_ccs(ref_type = 'broad'):

broad	icdcm_version
Diseases of the digestive system	10
Certain infectious and parasitic diseases	10
Diseases of the genitourinary system	10
Diseases of the eye and adnexa	10
Diseases of the ear and mastoid process	10
Endocrine, nutritional and metabolic diseases	10
Diseases of the circulatory system	10
Diseases of the blood and blood-forming organs and certain disorders involving the immune mechanism	10
Dental diseases	10
Neoplasms	10
NA	10
Diseases of the nervous system	10
Mental, behavioral and neurodevelopmental disorders	10
Factors influencing health status and contact with health services	10
Injury, poisoning and certain other consequences of external causes	10
Diseases of the musculoskeletal system and connective tissue	10
Diseases of the respiratory system	10
Diseases of the skin and subcutaneous tissue	10
Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified	10
Pregnancy, childbirth and the puerperium	10
Certain conditions originating in the perinatal period	10
Congenital malformations, deformations and chromosomal abnormalities	10
External causes of morbidity	10

chars_icd_ccs_count()

chars_icd_ccs_count() allows the user to get CHARS counts by ICD-CM code, ICD-cm description, or the superlevel, broad, midlevel, and detailed categories. I provide examples of each of these below, in order of decreasing granularity / specificity using hypertensive heart disease as a case study.

icdcm_code	icdcm	superlevel	broad	midlevel	detailed	icdcm_version
I110	Hypertensive heart disease with heart failure	Chronic diseases	Diseases of the circulatory system	Hypertension	Hypertension	10

However, before we begin, let’s review the possible arguments used by chars_icd_ccs_count():

ph.data: the name of a person level data.table/data.frame of CHARS data with ICD10-cm codes
icdcm_version: specifies the ICD-CM version that you want to reference. Acceptable options include: 9 & 10, with 10 being the default.
icdcm: the ICD-CM code of interest OR its description. It is case insensivitive and partial strings are allowed.
superlevel: ‘superlevel’ level descriptions that are of interest. Case insensivitive and partial strings are allowed.
broad: CCSR derived ‘broad’ level descriptions that are of interest. Case insensivitive and partial strings are allowed.
midlevel: ’midlevel level descriptions that are of interest. Case insensivitive and partial strings are allowed.
detailed: CCSR derived ‘detailed’ level descriptions that are of interest. Case insensivitive and partial strings are allowed.
icdcol: the name of the column in ph.data that contains the ICD10-cm codes. Default is diag1, which is provided when you use apde.data::chars().
group_by: identifies the variables by which you want to group (a.k.a., stratify) the results.
kingco: logical (T|F) specifying whether to limit the data analysis to King County. Only works if ph.data still has the chi_geo_kc column.

getting CHARS counts by ICD-10-CM code

  mycode <- chars_icd_ccs_count(ph.data = charsDT, 
                                icdcm = 'I110')

icdcm_desc	hospitalizations
Hypertensive heart disease with heart failure	1,886

getting CHARS counts by ICD-10-CM description

  mydesc <- chars_icd_ccs_count(ph.data = charsDT, 
                                icdcm = 'Hypertensive heart disease with heart failure')

icdcm_desc	hospitalizations
Hypertensive heart disease with heart failure	1,886

These results are identical to searching by code because, for hypertensive heart failure, there is a one-to-one match of description to ICD-10-CM code.

getting CHARS counts by `detailed` ICD-CM category

  mydetailed <- chars_icd_ccs_count(ph.data = charsDT, 
                                detailed = '^hypertension$')

detailed_desc	hospitalizations
Hypertension	5,984

getting CHARS counts by `midlevel` ICD-CM category

  mymidlevel <- chars_icd_ccs_count(ph.data = charsDT, 
                                midlevel = '^Hypertension$')

midlevel_desc	hospitalizations
Hypertension	6,261

getting CHARS counts by `broad` ICD-CM category

  mybroad <- chars_icd_ccs_count(ph.data = charsDT, 
                                broad = 'Diseases of the circulatory system')

broad_desc	hospitalizations
Diseases of the circulatory system	22,379

getting CHARS counts by `superlevel` ICD-CM category

  mysuperlevel <- chars_icd_ccs_count(ph.data = charsDT, 
                                superlevel = 'Chronic diseases')

superlevel_desc	hospitalizations
Chronic diseases	56,708

chars_injury_matrix()

The chars_injury_matrix() function provides a handy reference of all the mechanism and intent combinations that can be used with chars_injury_matrix_count(). Here are the first 10 rows:

mechanism	intent
any	any
any	assault
any	intentional
any	legal
any	undetermined
any	unintentional
bites_stings	any
bites_stings	assault
bites_stings	intentional
bites_stings	legal

If you just want to see a list of the available intents, type unique(chars_injury_matrix()[]$intent):

[1] "any"           "assault"       "intentional"   "legal"        
[5] "undetermined"  "unintentional"

Similarly, to see the available mechanisms, type unique(chars_injury_matrix()[]$mechanism):

 [1] "any"                      "bites_stings"            
 [3] "cut_pierce"               "drowning"                
 [5] "fall"                     "fire_burn"               
 [7] "firearm"                  "machinery"               
 [9] "motor_vehicle_nontraffic" "motor_vehicle_traffic"   
[11] "mvt_motorcyclist"         "mvt_occupant"            
[13] "mvt_other"                "mvt_pedal_cyclist"       
[15] "mvt_pedestrian"           "mvt_unspecified"         
[17] "natural_environmental"    "other_land_transport"    
[19] "other_specified"          "other_transport"         
[21] "overexertion"             "pedal_cyclist"           
[23] "pedestrian"               "poisoning"               
[25] "poisoning_drug"           "poisoning_nondrug"       
[27] "struck_by_against"        "suffocation"             
[29] "unspecified"

chars_injury_matrix_count()

The chars_injury_matrix_count() function is similar to the chars_icd_ccs_count() function above, except that it counts injury related hospitalizations. chars_injury_matrix_count() takes seven potential arguments:

ph.data: the name of a person level data.table/data.frame of CHARS data downloaded with apde.data::chars() or structured like rads.data::synthetic_chars. Note that the intents and mechanisms are pre-calculated so you will need to ensure ph.data has the relevant injury_mechanism and injury_intent columns. The easiest way to do this with real data is to have apde.data::chars() download all the columns.
intent: the injury intent of interest. Partial strings are allowed. Use 'none' or 'any' to ignore intent and return “Any intent”. Use '*' (the default wildcard) to return all possible intents.
mechanism: the injury mechanism of interest. Partial strings are allowed. Use 'none' or 'any' to ignore mechanism and return “Any mechanism”. Use '*' (the default wildcard) to return all possible mechanisms.
group_by: identifies the variables by which you want to group (a.k.a., stratify) the results.
def: acceptable values are ‘narrow’ or ‘broad’. It specifies whether you want to use the CDC’s recommended ‘narrow’ approach, which requires that the principal diagnosis of an injury hospitalization be a nature-of-injury ICD-10-CM code. Or, alternatively, the ‘broad’ definition that searches all available diagnosis fields on the hospital discharge record. See this document for details.
primary_ecode: logical (T|F) specifying whether to limit the analysis to using just the primary ecode (i.e., the injury_ecode variable), rather than all available ecodes. The vast majority of the time you will want to keep the default setting.
kingco: logical (T|F) specifying whether to limit the data analysis to King County. Only works if ph.data still has the chi_geo_kc column.

Specifying a single intent and ignoring the mechanism

  mat1 <- chars_injury_matrix_count(ph.data = charsDT, 
                              intent = 'assault', 
                              mechanism = 'none')

mechanism	intent	hospitalizations
Any mechanism	assault	166

Specifying more than one intent and ignoring the mechanism

  mat2 <- chars_injury_matrix_count(ph.data = charsDT, 
                              intent = 'assault|undetermined', 
                              mechanism = 'none')

mechanism	intent	hospitalizations
Any mechanism	assault	166
Any mechanism	undetermined	50

Note that you can also specify more than one intent or mechanism using a vector with separated values.

  mat2.alt <- chars_injury_matrix_count(ph.data = charsDT, 
                                        intent = c('assault', 'undetermined'), 
                                        mechanism = 'none')

Specifying a single mechanism and ignoring the intent

  mat3 <- chars_injury_matrix_count(ph.data = charsDT, 
                              intent = 'none', 
                              mechanism = 'motor_vehicle_traffic')

mechanism	intent	hospitalizations
motor_vehicle_traffic	Any intent	781

What happens if you specify ‘none’ or ‘any’ for both the mechanism and intent?

You get hospitalizations due to any injury.

  mat4 <- chars_injury_matrix_count(ph.data = charsDT, 
                              intent = 'none', 
                              mechanism = 'none')

mechanism	intent	hospitalizations
Any mechanism	Any intent	7,174

What happens if you don’t specify the mechanism and intent?

You get every possible combination of mechanism and intent. Let’s look at just the top 10 for convenience.

  mat5 <- chars_injury_matrix_count(ph.data = charsDT)[1:10]

mechanism	intent	hospitalizations
Any mechanism	Any intent	7,174
Any mechanism	assault	166
Any mechanism	intentional	519
Any mechanism	legal	7
Any mechanism	undetermined	50
Any mechanism	unintentional	6,432
bites_stings	Any intent	33
bites_stings	assault	0
bites_stings	intentional	0
bites_stings	legal	0

How different are the `narrow` and `broad` definitions?

  mat6 <- chars_injury_matrix_count(ph.data = charsDT, 
                              intent = 'none', 
                              mechanism = 'none', 
                              def = 'narrow')

  mat7 <- chars_injury_matrix_count(ph.data = charsDT, 
                              intent = 'none', 
                              mechanism = 'none', 
                              def = 'broad')
  
  deftable <- rbind(cbind(def = 'narrow', mat6),
                    cbind(def = 'broad', mat7))

def	mechanism	intent	hospitalizations
narrow	Any mechanism	Any intent	7,174
broad	Any mechanism	Any intent	11,392

These tables show that there is a huge difference in the number of hospitalizations, dependent upon the definition that you use. Unless you have a specific rationale for changing it, please use the default in your analyses (i.e., def = 'narrow').

Conclusion

We know this was a lot to process. The good news is that this vignette isn’t going anywhere. If you remember (a) that this vignette exists and (b) where to find it, you’ll be in good shape to take on standard CHARS analyses in the future.

If you’ve read through this vignette and the corresponding help files and are still confused, please feel free to reach out for assistance. You may have found a bug, who knows? Good luck!

– Updated January 20, 2026 (rads v1.5.3)

chars_functions

CHARS Functions

Introduction

Set up the environment

Getting CHARS data

For practice and development: rads.data::synthetic_chars

For APDE analysts: apde.data::chars()

⚠️ A note about King County population denominators

chars_validate_data()

chars_icd_ccs()

chars_icd_ccs_count()

getting CHARS counts by ICD-10-CM code

getting CHARS counts by ICD-10-CM description

getting CHARS counts by detailed ICD-CM category

getting CHARS counts by midlevel ICD-CM category

getting CHARS counts by broad ICD-CM category

getting CHARS counts by superlevel ICD-CM category

chars_injury_matrix()

chars_injury_matrix_count()

Specifying a single intent and ignoring the mechanism

Specifying more than one intent and ignoring the mechanism

Specifying a single mechanism and ignoring the intent

What happens if you specify ‘none’ or ‘any’ for both the mechanism and intent?

What happens if you don’t specify the mechanism and intent?

How different are the narrow and broad definitions?

Conclusion

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

For practice and development: `rads.data::synthetic_chars`

For APDE analysts: `apde.data::chars()`

getting CHARS counts by `detailed` ICD-CM category

getting CHARS counts by `midlevel` ICD-CM category

getting CHARS counts by `broad` ICD-CM category

getting CHARS counts by `superlevel` ICD-CM category

How different are the `narrow` and `broad` definitions?