In this vignette, we explore how the OmopSketch function
databaseCharacteristics() and
shinyCharacteristics() can serve as a valuable tool for
characterising databases containing electronic health records mapped to
the OMOP Common Data Model.
We begin by loading the necessary packages and creating a mock CDM using the R package omock:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(OmopSketch)
library(omock)
cdm <- mockCdmFromDataset(datasetName = "GiBleed", source = "duckdb")
#> ℹ Reading GiBleed tables.
#> ℹ Adding drug_strength table.
#> ℹ Creating local <cdm_reference> object.
#> ℹ Inserting <cdm_reference> into duckdb.
cdm
#>
#> ── # OMOP CDM reference (duckdb) of GiBleed ────────────────────────────────────
#> • omop tables: care_site, cdm_source, concept, concept_ancestor, concept_class,
#> concept_relationship, concept_synonym, condition_era, condition_occurrence,
#> cost, death, device_exposure, domain, dose_era, drug_era, drug_exposure,
#> drug_strength, fact_relationship, location, measurement, metadata, note,
#> note_nlp, observation, observation_period, payer_plan_period, person,
#> procedure_occurrence, provider, relationship, source_to_concept_map, specimen,
#> visit_detail, visit_occurrence, vocabulary
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -
The databaseCharacteristics() function provides a
comprehensive overview of the Common Data Model (CDM). It returns a summarised
result combining several characterisation components:
General database snapshot:
Generated using summariseOmopSnapshot(), this provides
high-level metadata about the CDM, including size of person table, time
span covered, source type, vocabulary version, etc.
Population characterisation:
Describes the demographics of population under observation, built using
the CohortConstructor
and CohortCharacteristics
packages.
Person table characterisation:
Produced using summarisePerson(), this component summarises
the content and missingness of the person table.
Observation period characterisation:
Produced using summariseObservationPeriod(), this component
summarises the content and missingness of the observation period
table.
Temporal trends — including changes in the number of records and
subjects, median age, sex distribution, and total person-days — are then
derived using summariseTrend().
Clinical tables characterisation:
Produced using summariseClinicalRecords(), this component
summarises the content and missingness across all clinical tables.
Temporal trends in the number of records and subjects, median age, and
sex distribution are also computed using
summariseTrend().
Concept Counts: Optionally, concept-level
summaries can be included by computing concept counts with
summariseConceptIdCounts().
Together, these outputs provide a holistic view of the CDM’s structure, data completeness, and temporal behaviour — supporting both data quality assessment and study feasibility evaluation.
result <- databaseCharacteristics(cdm = cdm)
By default, the following OMOP tables are included in the characterisation: visit_occurrence, visit_detail, condition_occurrence, drug_exposure, procedure_occurrence, device_exposure, measurement, observation, death.
You can customise which tables to include in the analysis by
specifying them with the omopTableName argument.
result <- databaseCharacteristics(
cdm = cdm,
omopTableName = c("drug_exposure", "condition_occurrence")
)
To stratify the characterisation results by sex, set the
sex argument to TRUE:
result <- databaseCharacteristics(
cdm = cdm,
omopTableName = c("drug_exposure", "condition_occurrence"),
sex = TRUE
)
You can choose to characterise the data stratifying by age group by creating a list defining the age groups you want to use.
result <- databaseCharacteristics(
cdm = cdm,
omopTableName = c("drug_exposure", "condition_occurrence"),
ageGroup = list(c(0, 50), c(51, 100))
)
Use the dateRange argument to limit the analysis to a
specific period. Combine it with the interval argument to
stratify results by time. Valid values for interval include “overall”
(default), “years”, “quarters”, and “months”:
result <- databaseCharacteristics(
cdm = cdm,
interval = "years",
dateRange = as.Date(c("2010-01-01", "2018-12-31"))
)
You can use the sample argument to limit the
characterisation to a subset of the CDM.
This can be useful for quickly exploring large datasets or focusing on a
specific cohort already included in the CDM.
The sample argument accepts either:
result <- databaseCharacteristics(
cdm = cdm,
sample = 1000L
)
result <- databaseCharacteristics(
cdm = cdm,
sample = "my_cohort"
)
To include concept counts in the characterisation, set
conceptIdCounts = TRUE:
result <- databaseCharacteristics(
cdm = cdm,
conceptIdCounts = TRUE
)
It is possible to pass arguments from any of the underlying functions
to databaseCharacteristics() in order to customise the
output. For example, to stratify trends and concept counts by records
observed in or out of observation, you can pass the argument
inObservation = TRUE:
result <- databaseCharacteristics(
cdm = cdm,
conceptIdCounts = TRUE,
inObservation = TRUE
)
To explore the characterisation results interactively, you can use
the shinyCharacteristics() function. This function
generates a Shiny application in the specified directory,
allowing you to browse, filter, and visualise the results through an
intuitive user interface.
shinyCharacteristics(result = result, directory = "path/to/your/shiny")
You can customise the title, logo, and theme of the Shiny app by setting the appropriate arguments:
title: The title displayed at the top of the
app
logo: Path to a custom logo (must be in SVG
format)
theme: One of the available OmopViewer
themes.
background: A custom background panel for the Shiny
app
shinyCharacteristics(
result = result,
directory = "path/to/my/shiny",
title = "Characterisation of my data",
logo = "path/to/my/logo.svg",
theme = "scarlet",
background = "path/to/my/background.md"
)
An example of the Shiny application generated by
shinyCharacteristics() can be explored here,
where the characterisation of several synthetic datasets is
available.
Finally, disconnect from the mock CDM.
cdmDisconnect(cdm = cdm)