Skip to content

chars_functions

Danny Colombara edited this page Jan 20, 2026 · 10 revisions

CHARS Functions

Introduction

The rads package has a suite of tools designed to facilitate and accelerate the analysis of standardized CHARS (Comprehensive Hospital Abstract Reporting System) data. Combining the rads functions below with the clean CHARS data (or with practice data from rads.data, which is installed automatically with rads) should allow analysts to conduct custom analyses with relative ease. The core rads CHARS functions are:

  • chars_icd_ccs(): view available CHARS ICD-9-CM and ICD-10-CM descriptions as well as ‘superlevel’, ‘broad’, ‘midlevel’, and ‘detailed’ aggregations derived from AHRQ’s HCUP CCSR that can be used with chars_icd_ccs_count()
  • chars_icd_ccs_count(): generate counts of CHARS hospitalizations using ICD-9-CM or ICD-10-CM descriptions or ‘superlevel’, ‘broad’, ‘midlevel’, and ‘detailed’ categories.
  • chars_injury_matrix(): view all available intents and mechanisms that can be used with chars_injury_matrix_count() (2012+)
  • chars_injury_matrix_count(): generate counts of injury related hospitalizations by intent and mechanism (2012+)
  • chars_validate_data(): validate that your CHARS dataset has the proper structure and columns needed for the injury and ICD-CM analysis functions

Additionally, if you have APDE credentials and are working within King County’s infrastructure, you can easily download standardized CHARS data from SQL into R (2012+) using apde.data::chars().

All of these functions have detailed help files that are accessible by typing ?function_name, e.g. ?chars_injury_matrix_count. Some examples for how to use these functions are given below.

A few quick notes before we begin …

  • apde.data::chars() can provide you with ICD-9-CM data (2012-2015) as well as ICD-10-CM data (2016+).
  • chars_injury_matrix() and chars_injury_matrix_count() are agnostic as to whether the underlying data are ICD-9-CM or ICD-10-CM.
  • chars_icd_ccs() & chars_icd_ccs_count() need you to specify which ICD-CM version you have in your data. This means you can analyze 2012-2015 data or 2016+ data, but not both at the same time in a single command.
  • If you want to create age-adjusted rates, we recommend you read the age_standardize and calculating_rates_with_rads vignettes after working through this one.

Set up the environment

rm(list=ls())
library(rads)
library(data.table)

Getting CHARS data

For practice and development: rads.data::synthetic_chars

If you’re learning to work with CHARS data or developing new analysis code, we provide a synthetic dataset that mimics the structure of real CHARS data. This privacy-safe dataset is automatically available when you install rads and contains over 250,000 rows of injury-related diagnosis data.

# Load the synthetic CHARS data
data(synthetic_chars, package = "rads.data")

# View the structure
str(synthetic_chars)
Classes 'data.table' and 'data.frame':  288326 obs. of  9 variables:
 $ seq_no              : int  1 2 3 4 5 6 7 8 9 10 ...
 $ diag1               : chr  "K8051" "Z3801" "K5651" "N136" ...
 $ injury_nature_broad : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ injury_nature_narrow: logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ injury_intent       : chr  NA NA NA NA ...
 $ injury_mechanism    : chr  NA NA NA NA ...
 $ chi_geo_kc          : chr  "King County" "King County" NA NA ...
 $ temperament         : chr  "Calm" "Moderate" "Calm" "Moderate" ...
 $ creation_date       : Date, format: "2026-01-07" "2026-01-07" ...
 - attr(*, ".internal.selfref")=<externalptr> 

The synthetic dataset includes the key columns needed for injury and ICD-CM analyses:

  • seq_no: unique patient-visit identifier
  • diag1: ICD-10-CM primary diagnosis code
  • injury_nature_broad & injury_nature_narrow: designation of the type of injury definition used. ‘narrow’ follows the CDC recommendation and ‘broad’ follows a more expansive definition. see ?rads.data::synthetic_chars for details
  • injury_intent: injury intent classification (e.g., ‘assault’, ‘unintentional’)
  • injury_mechanism: injury mechanism classification (e.g., ‘fall’, ‘firearm’)
  • chi_geo_kc: King County indicator
  • temperament: a categorical ‘demographic’ indicator for practice stratification

This synthetic dataset is perfect for testing code before running analyses on real data.

For APDE analysts: apde.data::chars()

Note: This section is only relevant for analysts working within King County’s APDE infrastructure who have the necessary credentials.

apde.data::chars() takes nine potential arguments:

  • cols: character. the names of the columns that you want to download. Limiting this list to only the variables you truly need can significantly improve download speed. In most cases, this includes any demographic stratifiers of interest along with “seq_no”, “diag1”, “injury_nature_broad”, “injury_nature_narrow”, “injury_intent”, and “injury_mechanism”.
  • year: the year(s) of interest, from 2012 to the present.
  • kingco: logical (T|F) . True or false specifies whether to limit the download to King County, based on truncated ZIP codes (980## and 981##).
  • version: character. Either 'final' or 'stage'.
  • wastate: logical (T|F). When false, data will include Oregon.
  • inpatient: logical (T|F). When false, data will include observation patients (i.e., outpatients).
  • deaths: logical (T|F). When true, the data will include those who died while in the hospital.
  • topcode: logical (T|F). When true, chi_age will be top coded to 100 to match population data top coding.

If you do not specify any of the arguments, you will get all CHARS data columns, for the latest year, for King County (defined by truncated ZIP codes), limited to inpatients, including those who died while hospitalized, with ages top coded to 100.

charsDT <- apde.data::chars(cols = c("seq_no", "diag1", "injury_nature_broad", 
                                     "injury_nature_narrow", "injury_intent", 
                                     "injury_mechanism", "chi_geo_kc", "chi_year", 
                                     "chi_age"), year = 2023)
unique(charsDT$chi_geo_kc) # confirm data is limited to King County
[1] "King County"
unique(charsDT$chi_year) # check the year
[1] 2023
max(charsDT$chi_age, na.rm = T) # check top coding
[1] 100

For the remainder of this vignette, we’ll use the synthetic dataset so that everyone can follow along:

charsDT <- rads.data::synthetic_chars

⚠️ A note about King County population denominators

Important note: Since King County in CHARS data is defined by truncated ZIP codes (980## and 981##), the correct denominator when calculating rates should be defined the same way. To be clear, this means you should NOT use apde.data::population(kingco = T) for King County CHARS denominators. Instead, you should get ZIP code population data and aggregate it for King County.

Here’s an example of how to obtain a King County population denominator by age, gender, and race (with Hispanic as a race):

# Get ZIP code level population data
denominator <- apde.data::population(kingco = FALSE, 
                             geo_type = 'zip', 
                             group_by = c('ages', 'genders', 'race_eth'))

# Subset to ZIP that begin with 980/981
denominator <- denominator[grepl('^980|^981', geo_id)]

# Sum the population across all these ZIP codes by gender, race/eth, and age
denominator <- denominator[, .(pop = sum(pop)), .(gender, race_eth, age)]

# Label it as King County
denominator[, geo_id := 'King County']

chars_validate_data()

Before diving into analysis, it’s good practice to validate that your CHARS dataset has the proper structure. The chars_validate_data() function checks that your data contains all the required columns with appropriate data types and values.

validated_chars <- chars_validate_data(ph.data = charsDT, 
                                       icdcol = 'diag1',
                                       icdcm_version = 10)

This function validates:

  • Required columns exist: seq_no, injury_nature_broad, injury_nature_narrow, injury_intent, injury_mechanism, and the ICD column
  • seq_no contains unique values (one per patient-visit)
  • Injury columns have appropriate data types (logical for nature columns, character for intent/mechanism)
  • ICD codes are properly formatted
  • Standard injury intent and mechanism categories are present (with informative messages if any are missing)

The function returns the validated data (with any necessary cleaning applied to ICD codes) and is especially useful when working with custom or external CHARS datasets.

chars_icd_ccs()

chars_icd_ccs() takes three arguments:

  • ref_typ: specifies the hospital diagnosis descriptions that are of interest to you. Acceptable options include: ‘all’, ‘icdcm’, ‘superlevel’, ‘broad’, ‘midlevel’, & ‘detailed’.
  • icdcm_version: specifies the ICD-CM version that you want to reference. Acceptable options include: 9 & 10, with 10 being the default.

Do not attempt to manually browse through chars_icd_ccs() … you will lose your mind because it has more than 100,000 rows! Rather, use it to identify the type of non-injury hospitalization of interest. The structure is simple and (hopefully!) self-explanatory. Let’s take a look at the first three rows as an example by typing chars_icd_ccs()[1:3]:

icdcm_code icdcm superlevel broad midlevel detailed icdcm_version
A00 Cholera Infectious diseases Diseases of the digestive system Intestinal infection Intestinal infection 10
A000 Cholera due to Vibrio cholerae 01, biovar cholerae Infectious diseases Diseases of the digestive system Intestinal infection Intestinal infection 10
A001 Cholera due to Vibrio cholerae 01, biovar eltor Infectious diseases Diseases of the digestive system Intestinal infection Intestinal infection 10

Teaching about regular expression, a.k.a. regex and filtering is outside the bounds of this vignette. However, I imagine you will usually want to use aggregated hospitalization data so I encourage you to look at the unique values of superlevel, broad, midlevel, and detailed data. For example, let’s examine the CCSR broad categories with chars_icd_ccs(ref_type = 'broad'):

broad icdcm_version
Diseases of the digestive system 10
Certain infectious and parasitic diseases 10
Diseases of the genitourinary system 10
Diseases of the eye and adnexa 10
Diseases of the ear and mastoid process 10
Endocrine, nutritional and metabolic diseases 10
Diseases of the circulatory system 10
Diseases of the blood and blood-forming organs and certain disorders involving the immune mechanism 10
Dental diseases 10
Neoplasms 10
NA 10
Diseases of the nervous system 10
Mental, behavioral and neurodevelopmental disorders 10
Factors influencing health status and contact with health services 10
Injury, poisoning and certain other consequences of external causes 10
Diseases of the musculoskeletal system and connective tissue 10
Diseases of the respiratory system 10
Diseases of the skin and subcutaneous tissue 10
Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified 10
Pregnancy, childbirth and the puerperium 10
Certain conditions originating in the perinatal period 10
Congenital malformations, deformations and chromosomal abnormalities 10
External causes of morbidity 10

chars_icd_ccs_count()

chars_icd_ccs_count() allows the user to get CHARS counts by ICD-CM code, ICD-cm description, or the superlevel, broad, midlevel, and detailed categories. I provide examples of each of these below, in order of decreasing granularity / specificity using hypertensive heart disease as a case study.

icdcm_code icdcm superlevel broad midlevel detailed icdcm_version
I110 Hypertensive heart disease with heart failure Chronic diseases Diseases of the circulatory system Hypertension Hypertension 10

However, before we begin, let’s review the possible arguments used by chars_icd_ccs_count():

  • ph.data: the name of a person level data.table/data.frame of CHARS data with ICD10-cm codes
  • icdcm_version: specifies the ICD-CM version that you want to reference. Acceptable options include: 9 & 10, with 10 being the default.
  • icdcm: the ICD-CM code of interest OR its description. It is case insensivitive and partial strings are allowed.
  • superlevel: ‘superlevel’ level descriptions that are of interest. Case insensivitive and partial strings are allowed.
  • broad: CCSR derived ‘broad’ level descriptions that are of interest. Case insensivitive and partial strings are allowed.
  • midlevel: ’midlevel level descriptions that are of interest. Case insensivitive and partial strings are allowed.
  • detailed: CCSR derived ‘detailed’ level descriptions that are of interest. Case insensivitive and partial strings are allowed.
  • icdcol: the name of the column in ph.data that contains the ICD10-cm codes. Default is diag1, which is provided when you use apde.data::chars().
  • group_by: identifies the variables by which you want to group (a.k.a., stratify) the results.
  • kingco: logical (T|F) specifying whether to limit the data analysis to King County. Only works if ph.data still has the chi_geo_kc column.

getting CHARS counts by ICD-10-CM code

  mycode <- chars_icd_ccs_count(ph.data = charsDT, 
                                icdcm = 'I110')
icdcm_desc hospitalizations
Hypertensive heart disease with heart failure 1,886

getting CHARS counts by ICD-10-CM description

  mydesc <- chars_icd_ccs_count(ph.data = charsDT, 
                                icdcm = 'Hypertensive heart disease with heart failure')
icdcm_desc hospitalizations
Hypertensive heart disease with heart failure 1,886

These results are identical to searching by code because, for hypertensive heart failure, there is a one-to-one match of description to ICD-10-CM code.

getting CHARS counts by detailed ICD-CM category

  mydetailed <- chars_icd_ccs_count(ph.data = charsDT, 
                                detailed = '^hypertension$')
detailed_desc hospitalizations
Hypertension 5,984

getting CHARS counts by midlevel ICD-CM category

  mymidlevel <- chars_icd_ccs_count(ph.data = charsDT, 
                                midlevel = '^Hypertension$')
midlevel_desc hospitalizations
Hypertension 6,261

getting CHARS counts by broad ICD-CM category

  mybroad <- chars_icd_ccs_count(ph.data = charsDT, 
                                broad = 'Diseases of the circulatory system')
broad_desc hospitalizations
Diseases of the circulatory system 22,379

getting CHARS counts by superlevel ICD-CM category

  mysuperlevel <- chars_icd_ccs_count(ph.data = charsDT, 
                                superlevel = 'Chronic diseases')
superlevel_desc hospitalizations
Chronic diseases 56,708

chars_injury_matrix()

The chars_injury_matrix() function provides a handy reference of all the mechanism and intent combinations that can be used with chars_injury_matrix_count(). Here are the first 10 rows:

mechanism intent
any any
any assault
any intentional
any legal
any undetermined
any unintentional
bites_stings any
bites_stings assault
bites_stings intentional
bites_stings legal

If you just want to see a list of the available intents, type unique(chars_injury_matrix()[]$intent):

[1] "any"           "assault"       "intentional"   "legal"        
[5] "undetermined"  "unintentional"

Similarly, to see the available mechanisms, type unique(chars_injury_matrix()[]$mechanism):

 [1] "any"                      "bites_stings"            
 [3] "cut_pierce"               "drowning"                
 [5] "fall"                     "fire_burn"               
 [7] "firearm"                  "machinery"               
 [9] "motor_vehicle_nontraffic" "motor_vehicle_traffic"   
[11] "mvt_motorcyclist"         "mvt_occupant"            
[13] "mvt_other"                "mvt_pedal_cyclist"       
[15] "mvt_pedestrian"           "mvt_unspecified"         
[17] "natural_environmental"    "other_land_transport"    
[19] "other_specified"          "other_transport"         
[21] "overexertion"             "pedal_cyclist"           
[23] "pedestrian"               "poisoning"               
[25] "poisoning_drug"           "poisoning_nondrug"       
[27] "struck_by_against"        "suffocation"             
[29] "unspecified"             

chars_injury_matrix_count()

The chars_injury_matrix_count() function is similar to the chars_icd_ccs_count() function above, except that it counts injury related hospitalizations. chars_injury_matrix_count() takes seven potential arguments:

  • ph.data: the name of a person level data.table/data.frame of CHARS data downloaded with apde.data::chars() or structured like rads.data::synthetic_chars. Note that the intents and mechanisms are pre-calculated so you will need to ensure ph.data has the relevant injury_mechanism and injury_intent columns. The easiest way to do this with real data is to have apde.data::chars() download all the columns.
  • intent: the injury intent of interest. Partial strings are allowed. Use 'none' or 'any' to ignore intent and return “Any intent”. Use '*' (the default wildcard) to return all possible intents.
  • mechanism: the injury mechanism of interest. Partial strings are allowed. Use 'none' or 'any' to ignore mechanism and return “Any mechanism”. Use '*' (the default wildcard) to return all possible mechanisms.
  • group_by: identifies the variables by which you want to group (a.k.a., stratify) the results.
  • def: acceptable values are ‘narrow’ or ‘broad’. It specifies whether you want to use the CDC’s recommended ‘narrow’ approach, which requires that the principal diagnosis of an injury hospitalization be a nature-of-injury ICD-10-CM code. Or, alternatively, the ‘broad’ definition that searches all available diagnosis fields on the hospital discharge record. See this document for details.
  • primary_ecode: logical (T|F) specifying whether to limit the analysis to using just the primary ecode (i.e., the injury_ecode variable), rather than all available ecodes. The vast majority of the time you will want to keep the default setting.
  • kingco: logical (T|F) specifying whether to limit the data analysis to King County. Only works if ph.data still has the chi_geo_kc column.

Specifying a single intent and ignoring the mechanism

  mat1 <- chars_injury_matrix_count(ph.data = charsDT, 
                              intent = 'assault', 
                              mechanism = 'none')
mechanism intent hospitalizations
Any mechanism assault 166

Specifying more than one intent and ignoring the mechanism

  mat2 <- chars_injury_matrix_count(ph.data = charsDT, 
                              intent = 'assault|undetermined', 
                              mechanism = 'none')
mechanism intent hospitalizations
Any mechanism assault 166
Any mechanism undetermined 50

Note that you can also specify more than one intent or mechanism using a vector with separated values.

  mat2.alt <- chars_injury_matrix_count(ph.data = charsDT, 
                                        intent = c('assault', 'undetermined'), 
                                        mechanism = 'none')

Specifying a single mechanism and ignoring the intent

  mat3 <- chars_injury_matrix_count(ph.data = charsDT, 
                              intent = 'none', 
                              mechanism = 'motor_vehicle_traffic')
mechanism intent hospitalizations
motor_vehicle_traffic Any intent 781

What happens if you specify ‘none’ or ‘any’ for both the mechanism and intent?

You get hospitalizations due to any injury.

  mat4 <- chars_injury_matrix_count(ph.data = charsDT, 
                              intent = 'none', 
                              mechanism = 'none')
mechanism intent hospitalizations
Any mechanism Any intent 7,174

What happens if you don’t specify the mechanism and intent?

You get every possible combination of mechanism and intent. Let’s look at just the top 10 for convenience.

  mat5 <- chars_injury_matrix_count(ph.data = charsDT)[1:10]
mechanism intent hospitalizations
Any mechanism Any intent 7,174
Any mechanism assault 166
Any mechanism intentional 519
Any mechanism legal 7
Any mechanism undetermined 50
Any mechanism unintentional 6,432
bites_stings Any intent 33
bites_stings assault 0
bites_stings intentional 0
bites_stings legal 0

How different are the narrow and broad definitions?

  mat6 <- chars_injury_matrix_count(ph.data = charsDT, 
                              intent = 'none', 
                              mechanism = 'none', 
                              def = 'narrow')

  mat7 <- chars_injury_matrix_count(ph.data = charsDT, 
                              intent = 'none', 
                              mechanism = 'none', 
                              def = 'broad')
  
  deftable <- rbind(cbind(def = 'narrow', mat6),
                    cbind(def = 'broad', mat7))
def mechanism intent hospitalizations
narrow Any mechanism Any intent 7,174
broad Any mechanism Any intent 11,392

These tables show that there is a huge difference in the number of hospitalizations, dependent upon the definition that you use. Unless you have a specific rationale for changing it, please use the default in your analyses (i.e., def = 'narrow').

Conclusion

We know this was a lot to process. The good news is that this vignette isn’t going anywhere. If you remember (a) that this vignette exists and (b) where to find it, you’ll be in good shape to take on standard CHARS analyses in the future.

If you’ve read through this vignette and the corresponding help files and are still confused, please feel free to reach out for assistance. You may have found a bug, who knows? Good luck!

Updated January 20, 2026 (rads v1.5.3)

Clone this wiki locally