RealSurvSim is an R package that provides a variety of methods for simulating survival (time-to-event) datasets. It is particularly useful for survival analysis applications in research and simulation studies. The package includes both non-parametric (kernel density estimation), parametric, and bootstrap-based simulation approaches for generating realistic time-to-event data.
cond): Splits
event and censoring times, then resamples to preserve the observed
event/censoring ratio.case): Simple random
resampling of entire observations with replacement.If you have downloaded or cloned this repository:
# Install devtools if you don't already have it
install.packages("devtools")
# Then, from the root of the package directory:
devtools::install_github()This package uses several R libraries for density estimation, distribution fitting, and survival analysis. They will be automatically installed (if not already present) when installing RealSurvSim. Key dependencies include:
Below is an overview of the core functions and some example usages. For detailed information on parameters and return values, refer to the function documentation.
data_simul_KDE(orig_vals, n = NULL, kernel = "gaussian")orig_vals: Numeric vector of original data
values.n: Number of observations to simulate (defaults to the
length of orig_vals).kernel: The kernel to use for KDE (currently supports
"gaussian").data_simul_Estim(orig_vals, n = NULL, distrib = "exp")orig_vals and draws new samples from the fitted
distribution.
"inverse_gamma",
"gompertz", "llogis", "gumbel",
"myMix", "exp".data_simul_bootstr(dat, n = NULL, type = "cond")dat: Dataframe containing at least V1
(time) and V2 (censor indicator, 0/1).n: Number of observations to sample. Defaults to the
same size as dat.type: "cond" for conditional bootstrap or
"case" for case-resampling.RealSurvSim(dat, col_time, col_status, col_group, reps = 10000, random_seed = 123, n = NULL, simul_type, distribs = c("exp", "exp", "exp", "exp"))"cond": Conditional bootstrap
"case": Case resampling
"distr": Parametric distribution-based
simulation
"KDE": Kernel density estimation-based
simulation
Parameters:
dat: Original (or reconstructed) dataset with time,
status, and group columns.col_time: Column name/index for time.col_status: Column name/index for censoring indicator
(1=event, 0=censored).col_group: Column name/index for treatment/group
identifier.reps: Number of datasets to simulate (default
10,000).random_seed: Random seed (default 123) for
reproducibility.n: Vector specifying sample sizes per group
(optional).simul_type: Single string specifying the simulation
method ("cond", "case", "distr",
"KDE").distribs: Which distributions to use if
simul_type = "distr".Returns:
A list containing multiple simulated datasets (one for
each repetition). Each dataset is a data.frame with
columns V1 (time), V2 (status), and
V3 (group).
Below are brief examples demonstrating how to simulate data. In
practice, replace the placeholders (example_data,
"time", etc.) with your actual dataset and column
names.
library(RealSurvSim)
# Example dataset construction (for demonstration):
set.seed(123)
example_data <- data.frame(
time = rexp(100, rate = 0.1), # Times
status = sample(0:1, 100, replace = TRUE), # 0=censored, 1=event
group = sample(0:1, 100, replace = TRUE) # Two groups, 0 or 1
)
# 1. Kernel Density Estimation Simulation
sim_kde <- RealSurvSim(
dat = example_data,
col_time = "time",
col_status = "status",
col_group = "group",
reps = 5, # Simulate 5 datasets
simul_type = "KDE" # Use KDE-based simulation
)
str(sim_kde$datasets) # Check the structure of generated datasets
# 2. Parametric Distribution Simulation
sim_distr <- RealSurvSim(
dat = example_data,
col_time = "time",
col_status = "status",
col_group = "group",
reps = 5,
simul_type = "distr",
distribs = c("exp", "exp", "exp", "exp")
)
str(sim_distr$datasets)
# 3. Conditional Bootstrap
sim_cond <- RealSurvSim(
dat = example_data,
col_time = "time",
col_status = "status",
col_group = "group",
reps = 5,
simul_type = "cond"
)
str(sim_cond$datasets)
# 4. Case Resampling
sim_case <- RealSurvSim(
dat = example_data,
col_time = "time",
col_status = "status",
col_group = "group",
reps = 5,
simul_type = "case"
)
str(sim_case$datasets)
data(liang)
data(wu)
# 5. liang_kde<- RealSurvSim(liang, liang$V1, liang$V2, liang$V3, reps=3, simul_type = "KDE")
# For arbitary n
# 6. arbliang_distr<- RealSurvSim(liang, liang$V1, liang$V2, liang$V3,reps=10,n = c(40,50), simul_type = "distr", distrib=c("exp", "llogis","llogis", "exp"))
# 7. arbwu_case<- RealSurvSim(wu, wu$V1, wu$V2, wu$V3, reps=100,n = c(40,50), simul_type = "case")Underlying Paper for the Package
Analysis and
Methods for Survival Data (arXiv:2308.07842)
Data Reconstruction Algorithm
Guyot et al. (2012), describing the algorithm for reconstructing
survival data from published Kaplan-Meier curves.
WebPlotDigitizer
WebPlotDigitizer
for extracting data points from Kaplan-Meier curves.