Getting Started with HCUPtools

Vikrant Dev Rathore

2025-12-04

HCUPtools is an R package for accessing and working with resources from the Agency for Healthcare Research and Quality (AHRQ) Healthcare Cost and Utilization Project (HCUP). This vignette provides a comprehensive guide to using the package for common healthcare data analysis tasks.

Installation and Setup

# Install from CRAN
install.packages("HCUPtools")

# Load the package
library(HCUPtools)
library(dplyr)  # For data manipulation examples

Part 1: Downloading CCSR Mapping Files

The Clinical Classifications Software Refined (CCSR) is a tool developed by AHRQ/HCUP to categorize ICD-10-CM diagnosis codes and ICD-10-PCS procedure codes into clinically meaningful categories. The download_ccsr() function provides direct access to these mapping files.

Download Latest Version

# Download the latest diagnosis CCSR mapping file
dx_map <- download_ccsr("diagnosis")

# Download the latest procedure CCSR mapping file
pr_map <- download_ccsr("procedure")

Download Specific Version

# Download a specific version (useful for reproducibility)
dx_map_v2025 <- download_ccsr("diagnosis", version = "v2025.1")
pr_map_v2025 <- download_ccsr("procedure", version = "v2025.1")

List Available Versions

# List all available versions
all_versions <- list_ccsr_versions()
print(all_versions)

# List only diagnosis versions
dx_versions <- list_ccsr_versions("diagnosis")

# List only procedure versions
pr_versions <- list_ccsr_versions("procedure")

Part 2: Mapping ICD-10 Codes to CCSR Categories

Once you have downloaded a mapping file, you can use ccsr_map() to map ICD-10 codes to CCSR categories. This function supports multiple output formats to accommodate different analytical needs.

Prepare Sample Data

# Create sample patient data with ICD-10 diagnosis codes
patient_data <- tibble::tibble(
  patient_id = 1:10,
  admission_date = as.Date(c("2024-01-15", "2024-02-20", "2024-03-10", 
                              "2024-04-05", "2024-05-12", "2024-06-18",
                              "2024-07-22", "2024-08-30", "2024-09-14",
                              "2024-10-08")),
  icd10_dx = c("E11.9", "I10", "M79.3", "E78.5", "K21.9", 
               "I50.9", "N18.6", "E78.5", "I25.10", "J44.1")
)

Long Format (Default)

The long format duplicates records for each assigned CCSR category. This is essential for cross-classification analysis where you need to count all assigned categories.

# Map codes using long format (default)
mapped_long <- ccsr_map(
  data = patient_data,
  code_col = "icd10_dx",
  map_df = dx_map,
  output_format = "long"
)

# View the results
head(mapped_long, 20)

# Count occurrences of each CCSR category
ccsr_counts <- mapped_long |>
  count(ccsr_category, sort = TRUE)
print(ccsr_counts)

Use Case: Long format is ideal when you want to: - Count how many times each CCSR category appears - Analyze cross-classifications (one ICD-10 code mapping to multiple CCSR categories) - Create frequency tables of CCSR categories

Wide Format

The wide format creates multiple columns (CCSR_1, CCSR_2, etc.) for multiple categories, keeping one row per ICD-10 code.

# Map codes using wide format
mapped_wide <- ccsr_map(
  data = patient_data,
  code_col = "icd10_dx",
  map_df = dx_map,
  output_format = "wide"
)

# View the results
head(mapped_wide)

Use Case: Wide format is ideal when you want to: - Keep all CCSR categories for each patient in a single row - Perform patient-level analysis - Maintain the original data structure with additional CCSR columns

Default Category Only

For diagnosis codes, CCSR assigns a “default” category that is recommended for principal diagnosis analysis. Use default_only = TRUE to extract only this default category.

# Map codes using default category only
mapped_default <- ccsr_map(
  data = patient_data,
  code_col = "icd10_dx",
  map_df = dx_map,
  default_only = TRUE
)

# View the results
head(mapped_default)

Use Case: Default category is ideal when you want to: - Analyze principal diagnoses only - Follow HCUP recommendations for diagnosis analysis - Maintain one-to-one mapping (one ICD-10 code = one CCSR category)

Part 3: Getting CCSR Descriptions

To understand what CCSR categories mean, use get_ccsr_description():

# Get descriptions for specific CCSR codes
ccsr_codes <- c("ADM010", "NEP003", "CIR019", "END001", "MBD001")
descriptions <- get_ccsr_description(ccsr_codes, map_df = dx_map)
print(descriptions)

# Get descriptions without pre-downloaded mapping (will download automatically)
descriptions_auto <- get_ccsr_description(
  c("ADM010", "NEP003"), 
  type = "diagnosis"
)

Part 4: Working with Procedure Codes

The package also supports ICD-10-PCS procedure codes:

# Download procedure mapping
pr_map <- download_ccsr("procedure")

# Create sample procedure data
procedure_data <- tibble::tibble(
  case_id = 1:5,
  procedure_date = as.Date(c("2024-01-20", "2024-02-15", "2024-03-22",
                              "2024-04-10", "2024-05-18")),
  icd10_pcs = c("0DB60ZZ", "0DT70ZZ", "0WQ3XZ", "0FB00ZZ", "0HB00ZX")
)

# Map procedure codes
mapped_procedures <- ccsr_map(
  data = procedure_data,
  code_col = "icd10_pcs",
  map_df = pr_map
)

# View the results
head(mapped_procedures)

Part 5: Complete Analysis Workflow

Here’s a complete workflow for analyzing CCSR categories in a dataset:

# Step 1: Download mapping file
dx_map <- download_ccsr("diagnosis")

# Step 2: Map diagnosis codes
patient_data_mapped <- ccsr_map(
  data = patient_data,
  code_col = "icd10_dx",
  map_df = dx_map,
  output_format = "long"
)

# Step 3: Count occurrences of each CCSR category
ccsr_counts <- patient_data_mapped |>
  count(ccsr_category, sort = TRUE)

# Step 4: Merge with descriptions for reporting
ccsr_counts_with_desc <- ccsr_counts |>
  left_join(
    get_ccsr_description(
      unique(patient_data_mapped$ccsr_category), 
      map_df = dx_map
    ),
    by = c("ccsr_category" = "ccsr_code")
  )

# Step 5: View the final results
print(ccsr_counts_with_desc)

Part 6: Downloading HCUP Summary Trend Tables

The package also provides access to HCUP Summary Trend Tables, which contain aggregated information on hospital utilization trends:

# List available tables (interactive menu)
available_tables <- download_trend_tables()
print(available_tables)

# Download a specific table by ID
# Table 2a: All Inpatient Encounter Types - Trends in Number of Discharges
table_path <- download_trend_tables("2a")

# Download all tables as a ZIP file (~81 MB)
all_tables_zip <- download_trend_tables("all")

The trend tables include: - Overview of trends in inpatient and emergency department utilization - All inpatient encounter types (discharges, percent, length of stay, mortality, population rates) - Inpatient encounter types (normal newborns, deliveries, elective/non-elective stays) - Inpatient service lines (maternal/neonatal, mental health, injuries, surgeries, medical conditions) - ED treat-and-release visits

For more information, see: HCUP Summary Trend Tables

Reading Trend Tables

# Read the trend table data
trend_data <- read_trend_table(table_path, sheet = "National")
head(trend_data)

# List available sheets
sheets <- list_trend_table_sheets(table_path)
print(sheets)

# Read specific state data
california_data <- read_trend_table(table_path, sheet = "California")

Part 7: Accessing CCSR Change Logs

View changes between CCSR versions:

# Get change log as data table (default)
changelog <- ccsr_changelog(version = "v2026.1")
print(changelog)

# Get change log URL
changelog_url <- ccsr_changelog(version = "v2026.1", format = "url")

# View change log in default PDF viewer
ccsr_changelog(version = "v2026.1", format = "view")

# Download change log file
changelog_file <- ccsr_changelog(version = "v2026.1", format = "download")

Part 8: Generating Citations

When using HCUP data in publications, always cite the source properly:

# Generate text citation for CCSR
cat(hcup_citation())

# Generate citation for Summary Trend Tables
cat(hcup_citation(resource = "trend_tables"))

# Generate BibTeX citation (for LaTeX documents)
cat(hcup_citation(format = "bibtex"))

# Generate R citation object (for R markdown)
citation_obj <- hcup_citation(format = "r")
print(citation_obj)

Part 9: Reading Downloaded Files

If you’ve already downloaded files, you can read them directly:

# Read CCSR file from various formats
dx_map <- read_ccsr("path/to/DXCCSR-v2026-1.zip")
dx_map <- read_ccsr("path/to/DXCCSR_v2026-1.csv")
dx_map <- read_ccsr("path/to/DXCCSR_v2026-1.xlsx")
dx_map <- read_ccsr("path/to/extracted_directory/")

# Read trend table Excel file
national_data <- read_trend_table(
  "path/to/HCUP_SummaryTrendTables_T2a.xlsx",
  sheet = "National"
)

Important Notes

Data Download

Cross-Classification

Default Categories

Performance

Additional Resources