Type: | Package |
Title: | Descriptive Analysis on 3 EHRs Datasets |
Version: | 0.1.0 |
Maintainer: | Samuele Marelli <samu2003.marelli@gmail.com> |
Description: | Provides functions to load and analyze three open Electronic Health Records (EHRs) datasets of patients diagnosed with glioblastoma, previously released under the Creative Common Attribution 4.0 International (CC BY 4.0) license. Users can generate basic descriptive statistics, frequency tables and save descriptive summary tables, as well as create and export univariate or bivariate plots. The package is designed to work with the included datasets and to facilitate quick exploratory data analysis and reporting. More information about these three datasets of EHRs of patients with glioblastoma can be found in this article: Gabriel Cerono, Ombretta Melaiu, and Davide Chicco, 'Clinical feature ranking based on ensemble machine learning reveals top survival factors for glioblastoma multiforme', Journal of Healthcare Informatics Research 8, 1-18 (March 2024). <doi:10.1007/s41666-023-00138-1>. |
License: | GPL-3 |
Depends: | R (≥ 3.5) |
Imports: | DataExplorer, flextable, ggplot2, rmarkdown, summarytools, table1, tinytex |
Suggests: | ragg, testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
SystemRequirements: | pandoc (>= 1.12.3) - http://pandoc.org, tcltk (optional) |
NeedsCompilation: | no |
Packaged: | 2025-09-15 08:42:36 UTC; samum |
Author: | Samuele Marelli [aut, cre], Davide Chicco [aut] |
Repository: | CRAN |
Date/Publication: | 2025-09-21 13:50:02 UTC |
glioblastomaEHRsData: Descriptive analysis on 3 EHRs datasets
Description
Provides functions to load and analyze three open Electronic Health Records (EHRs) datasets of patients diagnosed with glioblastoma, previously released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. Users can generate basic descriptive statistics, frequency tables and save descriptive summary table, as well as create and export univariate or bivariate plots. The package is designed to work with the included datasets and to facilitate quick exploratory data analysis and reporting. More information about these three datasets of EHRs of patients with glioblastoma can be found in this article: Gabriel Cerono, Ombretta Melaiu, and Davide Chicco, "Clinical feature ranking based on ensemble machine learning reveals top survival factors for glioblastoma multiforme", Journal of Healthcare Informatics Research 8, 1–18 (March 2024). <10.1007/s41666-023-00138-1>.
Author(s)
Maintainer: Samuele Marelli samu2003.marelli@gmail.com
Authors:
Davide Chicco davide.chicco@gmail.com
Munich2019datasetDescriptiveStatistics
Description
Provides descriptive statistics for the 'munich2019dataset'. It can display summary statistic for continuous variables, frequency tables for categorical variables or both.
Usage
Munich2019datasetDescriptiveStatistics(show = "all")
Arguments
show |
character string for the type of statistic to show. There are three options:
|
Value
A list or an object depending on the value of show
:
- If show = "continuous"
A data frame with descriptive statistics.
- If show = "categorical"
A list of frequency tables.
- If show = "all"
A list of 2 items,
the first is a data frame for continuous variable,
the second is a list for categorical variables.
- If show is not one of the above return a warning.
See Also
summarytools::descr()
, summarytools::freq()
Examples
# Show both continuous and categorical stats
all <- Munich2019datasetDescriptiveStatistics(show = "all")
# Mean value for the OS_months columns
all$continuous["Mean", "OS_months"]
# Min, Median and Max value for ALL the numeric columns
all$continuous[c("Min","Median","Max"),]
# Get the frequency table of the third categorical column
all$categorical[[3]]
# Only continuous variables
cont <- Munich2019datasetDescriptiveStatistics("continuous")
# Statistics for age_years column
cont[, "age_years"]
# Only categorical variables
cat <- Munich2019datasetDescriptiveStatistics("categorical")
# Frequency values for all factors of the first categorical column
cat[[1]][,"Freq"]
# Statistics for the Low factor of the first categorical column
cat[[1]]["Low",]
# Frequency of the Low factor, a value of the first categorical column
cat[[1]]["Low", "Freq"]
Tainan2020datasetDescriptiveStatistics
Description
Provides descriptive statistics for the 'tainan2020dataset'. It can display summary statistic for continuous variables, frequency tables for categorical variables or both.
Usage
Tainan2020datasetDescriptiveStatistics(show = "all")
Arguments
show |
character string for the type of statistic to show. There are three options:
|
Value
A list or an object depending on the value of show
:
- If show = "continuous"
A data frame with descriptive statistics.
- If show = "categorical"
A list of frequency tables.
- If show = "all"
A list of 2 items,
the first is a data frame for continuous variable,
the second is a list for categorical variables.
- If show is not one of the above return a warning.
See Also
summarytools::descr()
, summarytools::freq()
Examples
# Show both continuous and categorical stats
all <- Tainan2020datasetDescriptiveStatistics(show = "all")
# Mean value for the OS_months columns
all$continuous["Mean", "OS_months"]
# Min, Median and Max value for ALL the numeric columns
all$continuous[c("Min","Median","Max"),]
# Get the frequency table of the third categorical column
all$categorical[[3]]
# Only continuous variables
cont <- Tainan2020datasetDescriptiveStatistics("continuous")
# Statistics for age_years column
cont[, "age_years"]
# Only categorical variables
cat <- Tainan2020datasetDescriptiveStatistics("categorical")
# Frequency values for all factors of the first categorical column
cat[[1]][,"Freq"]
# Statistics for the Yes factor of the first categorical column
cat[[1]]["Yes",]
# Frequency of the No factor, a value of the first categorical column
cat[[1]]["No", "Freq"]
Utrecht2019datasetDescriptiveStatistics
Description
Provides descriptive statistics for the 'utrecht2019dataset'. It can display summary statistic for continuous variables, frequency tables for categorical variables or both.
Usage
Utrecht2019datasetDescriptiveStatistics(show = "all")
Arguments
show |
character string for the type of statistic to show. There are three options:
|
Value
A list or an object depending on the value of show
:
- If show = "continuous"
A data frame with descriptive statistics.
- If show = "categorical"
A list of frequency tables.
- If show = "all"
A list of 2 items,
the first is a data frame for continuous variable,
the second is a list for categorical variables.
- If show is not one of the above return a warning.
See Also
summarytools::descr()
, summarytools::freq()
Examples
# Show both continuous and categorical stats
all <- Utrecht2019datasetDescriptiveStatistics(show = "all")
# Mean value for the OS_months columns
all$continuous["Mean", "OS_months"]
# Min, Median and Max value for ALL the numeric columns
all$continuous[c("Min","Median","Max"),]
# Get the frequency table of the third categorical column
all$categorical[[3]]
# Only continuous variables
cont <- Utrecht2019datasetDescriptiveStatistics("continuous")
# Statistics for age_years column
cont[, "age_years"]
# Only categorical variables
cat <- Utrecht2019datasetDescriptiveStatistics("categorical")
# Frequency values for all factors of the second categorical column
cat[[2]][,"Freq"]
# Statistics for the Yes factor of the second categorical column
cat[[2]]["Monotherapy",]
# Frequency of the No factor, a value of the second categorical column
cat[[2]]["RT+TMZ", "Freq"]
Descriptive statistics table for the munich2019dataset
Description
Function to create, display and optionally export a table
containing descriptive statistics for the munich2019dataset.
The table is created using the table1 package and is grouped by survival status.
Usage
descriptiveTableMunich2019dataset(savePath = NULL)
Arguments
savePath |
Character (optional). String specifying the path and filename for exporting the table.
|
Value
A 'table1' object containing descriptive statistics grouped by survival status.
See Also
Examples
# Create and display the table, without saving anything
descriptiveTableMunich2019dataset()
# Create, display and save the table giving a path, filename and extension
descriptiveTableMunich2019dataset(savePath = "tables/munich_web_table.html")
# Create, display and save the table giving a path, filename and extension
descriptiveTableMunich2019dataset(savePath = "tables/munich_table.pdf")
# Create, display and save the table giving only the path, default name will be used
descriptiveTableMunich2019dataset(savePath = "tables/")
# Create, display and save the table giving a path and filename, default extension will be used
descriptiveTableMunich2019dataset(savePath = "tables/munich_table")
Descriptive statistic table for the tainan2020dataset
Description
Function to create, display and optionally export a table
containing descriptive statistics for the tainan2020dataset.
The table is created using the table1 package and is grouped by survival status.
Usage
descriptiveTableTainan2020dataset(savePath = NULL)
Arguments
savePath |
Character (optional). String specifying the path and filename for exporting the table.
|
Value
A 'table1' object containing descriptive statistics grouped by survival status.
See Also
Examples
# Create and display the table, without saving anything
descriptiveTableTainan2020dataset()
# Create, display and save the table giving a path, filename and extension
descriptiveTableTainan2020dataset(savePath = "tables/tainan_web_table.html")
# Create, display and save the table giving a path, filename and extension
descriptiveTableTainan2020dataset(savePath = "tables/tainan_table.pdf")
# Create, display and save the table giving only the path, default name will be used
descriptiveTableTainan2020dataset(savePath = "tables/")
# Create, display and save the table giving a path and filename,default extension will be used
descriptiveTableTainan2020dataset(savePath = "tables/tainan_table")
Descriptive statistic table for the utrecht2019dataset
Description
Function to create, display and optionally export a table
containing descriptive statistics for the utrecht2019dataset
The table is created using the table1 package and is grouped by survival status.
Usage
descriptiveTableUtrecht2019dataset(savePath = NULL)
Arguments
savePath |
Character (optional). String specifying the path and filename for exporting the table.
|
Value
A 'table1' object containing descriptive statistics grouped by survival status.
See Also
Examples
# Create and display the table, without saving anything
descriptiveTableUtrecht2019dataset()
# Create, display and save the table giving a path, filename and extension
descriptiveTableUtrecht2019dataset(savePath = "tables/utrecht_web_table.html")
# Create, display and save the table giving a path, filename and extension
descriptiveTableUtrecht2019dataset(savePath = "tables/utrecht_table.pdf")
# Create, display and save the table giving only the path, default name will be used
descriptiveTableUtrecht2019dataset(savePath = "tables/")
# Create, display and save the table giving a path and filename,default extension will be used
descriptiveTableUtrecht2019dataset(savePath = "tables/utrecht_table")
Declarations for glibal variables
Description
This file is used to suppress global variables NOTEs while running 'R CMD check', these variables are used as internal datasets in the package functions. They are declared in the 'data/' folder and are automatically loaded using 'lazydata = TRUE'
Usage
global_variables
Format
An object of class NULL
of length 0.
Details
@keywords internal @name global_variables
EHR data of patients affected by Glioblastoma (GBM), collected from Munich, Germany.
Description
The dataset contains data from 60 patients, treated at the 2 hospitals
Klinikum rechts der Isar (TUM) and Klinikum Bogenhausen (STKM),
Munich, Germany.
It focuses on the prognostic impact of cytosolic Hsp70 (cHsp70) expression
and MGMT promoter methylation status.
Usage
data(munich2019dataset)
Format
A data frame with 60 rows and 8 variables:
- age_years
Numeric. Patient's age in years.
- cHsp70_low0_high1
Factor. "Low" (0) or "High" (1) cytosolic Hsp70 expression.
- MGMTmethylation_methylated1_unmethylated0
Factor "unmethylated" (0) or "methylated" (1) MGMT promoter.
- OS_months
Numeric. Overall survival in months.
- PFS_months
Numeric. Progression-free survival in months.
- sex_male0_female1
Factor. "Male" (0) or "Female" (1).
- survived_yes1_no0
Factor. "Dead" (0) or "Alive" (1) at their most recent check-up.
- tumor_progression_yes1_no0
Factor. "No" (0) or "Yes" (1) for tumor progression after treatment.
References
Lämmer F, Delbridge C. Würstle S et al. (2019) Cytosolic Hsp70 as a biomarker to predict clinical outcome in patients with glioblastoma. PMID: 31430337. 14(8): e0221502. https://doi.org/10.1371/journal.pone.0221502
Examples
data(munich2019dataset)
head(munich2019dataset)
Plot variables from the munich2019dataset
Description
This function creates a plot of one or two variables from the 'munich2019dataset' dataframe. It automatically chooses the appropriate plot type based on the variable types. It can export and save the plot by specifying the directory, filename and extension. Provided variable names need to be in the dataset, otherwise an error will be launched. The function makes it easy to plot univariate and bivariate plots while offering a quick way to save and export them.
Usage
plotMunich2019dataset(name1, name2 = NA, savePath = NA)
Arguments
name1 |
Character. The name of the first variable to plot. |
name2 |
Character (optional). The name of the second variable for bivariate plots. Default is NA. |
savePath |
Character (optional). File path where the plot should be saved. Default is NULL which means no plot will be saved. To save a plot using all the default options put an empty string. The format must be: 'filepath/filename.extension' where:
|
Details
The function supports the following plotting logic:
If only 'name1' is provided:
Numeric/integer variable → Histogram.
Character/factor variable → Bar plot.
If both 'name1' and 'name2' are provided:
One numeric/integer and one categorical → Boxplot.
Both categorical → Grouped bar plot.
Value
A 'ggplot2' object representing the generated plot. If the specified variables are not found in the dataset, returns a warning.
See Also
DataExplorer::plot_bar()
, DataExplorer::plot_histogram()
, DataExplorer::plot_boxplot, savePlot()
Examples
# Univariate plot without saving
plotMunich2019dataset("age_years")
# Bivariate plot without saving
plotMunich2019dataset("cHsp70_low0_high1", "sex_male0_female1")
# Bivariate plot saved in the path directory with the chosen name and extension
plotMunich2019dataset("age_years", "sex_male0_female1", savePath = NA)
# Univariate plot saved in the working directory with the chosen name and extension
plotMunich2019dataset("sex_male0_female1", savePath = NA)
# Univariate plot saved in the path directory with default name and extension
plotMunich2019dataset("MGMTmethylation_methylated1_unmethylated0", savePath = NA)
Plot variables from the tainan2020dataset
Description
This function creates a plot of one or two variables from the 'tainan2020dataset' dataframe. It automatically chooses the appropriate plot type based on the variable types and it can export and save the plot by specifying the directory, filename and extension. Provided variable names need to be in the dataset, otherwise an error will be launched. The function makes it easy to plot univariate and bivariate plots while offering a quick way to save and export them.
Usage
plotTainan2020dataset(name1, name2 = NA, savePath = NA)
Arguments
name1 |
Character. The name of the first variable to plot. |
name2 |
Character (optional). The name of the second variable for bivariate plots. Default is NA. |
savePath |
Character (optional). File path where the plot should be saved. Default is NULL which means no plot will be saved. To save a plot using all the default options put an empty string. The format must be: 'filepath/filename.extension' where:
|
Details
The function supports the following plotting logic:
If only 'name1' is provided:
Numeric/integer variable → Histogram.
Character/factor variable → Bar plot.
If both 'name1' and 'name2' are provided:
One numeric/integer and one categorical → Boxplot.
Both categorical → Grouped bar plot.
Value
A 'ggplot2' object representing the generated plot. If the specified variables are not found in the dataset, returns a warning.
See Also
DataExplorer::plot_bar()
, DataExplorer::plot_histogram()
, DataExplorer::plot_boxplot, savePlot()
Examples
# Univariate plot without saving
plotTainan2020dataset("TMZ_based_chemo_yes1_no0")
# Bivariate plot without saving
plotTainan2020dataset("PFS_months", "OS_months")
# Bivariate plot saved in the specified directory with the chosen name and extension
plotTainan2020dataset("age_years", "chemo_yes1_no0", savePath = NA)
# Bivariate plot saved in the working directory with the chosen name and extension
plotTainan2020dataset("PFS_months", "OS_months", savePath = NA)
# Bivariate plot saved in a path directory with default name and extension
plotTainan2020dataset("PFS_months", "radiation_dose_Gy", savePath = NA)
Plot variables from the utrecht2019dataset
Description
This function creates a plot of one or two variables from the 'utrecht2019dataset' dataframe. It automatically chooses the appropriate plot type based on the variable types and it can export and save the plot by specifying the directory, filename and extension. Provided variable names need to be in the dataset, otherwise an error will be launched. The function makes it easy to plot univariate and bivariate plots while offering a quick way to save and export them.
Usage
plotUtrecht2019dataset(name1, name2 = NA, savePath = NA)
Arguments
name1 |
Character. The name of the first variable to plot. |
name2 |
Character (optional). The name of the second variable for bivariate plots. Default is NA. |
savePath |
Character (optional). File path where the plot should be saved. Default is NULL which means no plot will be saved. To save a plot using all the default options put an empty string. The format must be: 'filepath/filename.extension' where:
|
Details
The function supports the following plotting logic:
If only 'name1' is provided:
Numeric/integer variable → Histogram.
Character/factor variable → Bar plot.
If both 'name1' and 'name2' are provided:
One numeric/integer and one categorical → Boxplot.
Both categorical → Grouped bar plot.
Value
A 'ggplot2' object representing the generated plot. If the specified variables are not found in the dataset, returns a warning.
See Also
DataExplorer::plot_bar()
, DataExplorer::plot_histogram()
, DataExplorer::plot_boxplot, savePlot()
Examples
# Univariate plot without saving
plotUtrecht2019dataset("SVZ_status_nocontact0_contact1")
# Bivariate plot without saving
plotUtrecht2019dataset("post_surgery_therapy_none0_monotherapy1_RTandTMZ2", "OS_months")
# Bivariate plot saved in the specified directory with the chosen name and extension
plotUtrecht2019dataset("age_years", "KPS_less70.0_more70.1", savePath = NA)
# Univariate plot save in the working directory with the chosen name and extension
plotUtrecht2019dataset("survived_yes1_no0", savePath = NA)
# Bivariate plot save in the path directory using the default name
plotUtrecht2019dataset("survived_yes1_no0", savePath = NA)
Save a plot
Description
Internal function to save a 1D or 2D ggplot2 plot specifying the filepath, filename and extension. Supported extensions are the ones supported by 'ggsave', default extension is .png. Default_filename is in this format: 'plot_Dataset_var1_var2_timestamp.png'
Usage
savePlot(plot, names, savePath = "")
Arguments
plot |
'ggplot2' object. The plot to be saved. |
names |
List. List of at least 2 items, first the dataset name, second the x variable of the plot, and third the y variable of the plot (only in bivariate plots). |
savePath |
Character (optional). String specifying the path and filename for the exported plot.
|
Details
The functions takes a table1 object, a character vector containing the dataset name and the save path to export the table. It first checks whether the dataset name is present, the it creates the default file name. If the save path doesn't exist or in an empty string it saves the plot in the working directory using the default file name. If the save path exists but doesn't contain the file name, it saves the plot in the path directory using the default file name. If the save path exists and has a file name, it saves the plot in the path directory with the user chosen file name.
Value
A status number, 0 if the save was successfull, -1 if there were errors.
See Also
This internal functions is used by: plotMunich2019dataset()
, plotTainan2020dataset()
and plotUtrecht2019dataset()
saveTable
Description
Internal function to save a table1 table object in png, pdf or html format. Default extension is png, default_filename is in this format: 'table_dataset_timestamp.png'.
Usage
saveTable(t1, names, savePath = "")
Arguments
t1 |
A 'table1' object to be saved. |
names |
A character vector of at least 1 item: the dataset name, other items may be used for testing |
savePath |
A character string specifying the full path and filename for the exported table.
|
Details
The functions takes a table1 object, a character vector containing the dataset name and the save path to export the table. It first checks whether the dataset name is present, the it creates the default file name. If the save path doesn't exist or in an empty string it saves the plot in the working directory using the default file name. If the save path exists but doesn't contain the file name, it saves the plot in the path directory using the default file name. If the save path exists and has a file name, it saves the plot in the path directory with the user chosen file name.
Value
0 if the save was successfull, or a warning message if there were errors.
See Also
This internal functions is used by: descriptiveTableMunich2019dataset()
, descriptiveTableTainan2020dataset()
and descriptiveTableUtrecht2019dataset()
EHR data of patients affected by Glioblastoma (GBM) from Tainan and Taiwan medical centers and branches.
Description
Data form patients affected by GBM in the Tainan and Taiwan medical centers and affiliated branches between 2005 and 2016.
It contains data from 84 patients, ranging from demographical information (age and sex) and
treatment characteristics (chemotherapy, radiation volume and dose) to
surgical details and outcomes (Overall and Progress Free Survival).
The dataset focuses on the impact of radiation volume and dose;
it finds that patients with an elevated radiation dose (>60.0 Gy) had better median PFS and OS compared to patients who received a standard radiation dose.
Some data may be missing due to the nature of clinical records.
Usage
data(tainan2020dataset)
Format
A data frame with 84 rows and 12 variables:
- age_years
Numeric. Patient's age.
- chemo_yes1_no0
Factor. "No" (0) or "Yes" (1) for chemotherapy.
- OS_months
Numeric. Overall Survival expressed in months.
- PFS_months
Numeric. Progress Free Survival expressed in months.
- radiation_dose_Gy
Numeric. Radiation dose used in chemotherapy expressed in Gy.
- radiation_volume_mL
Numeric. Radiation volume used in chemotherapy expressed in mL.
- sex_male0_female1
Factor. "Male" (0) or "Female" (1).
- surgery_resection1_biopsy0
Factor. "Biopsy" (0) or "Resection" (1) for surgery
- survived_yes1_no0
Factor. "Dead" (0) or "Alive" (1) at their most recent check-up.
- TMZ_based_chemo_yes1_no0
Factor. "No" (0) or "Yes" (1) for TMZ based chemotherapy.
- tumorProgression_no0_yes1
Factor. "No" (0) or "Yes" (1) for tumor progression after the initial treatment.
- year_of_diagnosis_05to10_0_10to16_1
Factor. "2005-2010" (0) or "2010-2016" (1) for the year of diagnosis.
References
Li-Tsun Shieh, How-Ran Guo, Chung-Han Ho et al. (2020). Survival of glioblastoma treated with a moderately escalated radiation dose—Results of a retrospective analysis. PMID: 32413077. 15(5): e0233188. https://doi.org/10.1371/journal.pone.0233188
Examples
data(tainan2020dataset)
head(tainan2020dataset)
EHR data of patients affected by Glioblastoma (GBM) treated at the University Medical Centre of Utrecht.
Description
Electronic Health Records (EHR) of patients affected by GBM at the University Medical Centre of Utrecht (UMCU) between 2005-2013.
This dataset contains clinical information from 647 patients collected to study prognostic factor
influencing overall survival in GBM patients, this study focuses on the involvement of the
subventricular zone (SVZ), a brain region that may play a role in tumor progression.
The dataset found out that SVZ contact meant worse prognosis for patients,
in fact SVZ-contacting tumors had worse median OS compared to non SVZ-contacting patients.
Some data may be missing due to the nature of clinical records.
Usage
data(utrecht2019dataset)
Format
A data frame with 647 rows and 7 variables:
- age_years
Numeric. Patient's age.
- KPS_less70.0_more70.1
Factor. "<70" (0) or ">70" (1) KPS level.
- OS_months
Numeric. Overall Survival expressed in months.
- post_surgery_therapy_none0_monotherapy1_RTandTMZ2
Factor. "None" (0), "Monotherapy" (1) or "RT+TMZ" (2) for therapy regime.
- surgery_biopsy0_resection1
Factor. "Biopsy" (0) or "Resection" (1) for surgery type.
- survived_yes1_no0
Factor. "Dead" (0) or "Alive" (1) at their most recent check-up.
- SVZ_status_nocontact0_contact1
Factor. "No" (0) or "Yes" (1) for SVZ contact.
References
Berendsen S, van Bodegraven E, Seute T et al. (2019). Adverse prognosis of glioblastoma contacting the subventricular zone: Biological correlates. PMID: 31603915. 14(10): e0222717. https://doi.org/10.1371/journal.pone.0222717
Examples
data(utrecht2019dataset)
head(utrecht2019dataset)