RealSurvSim

RealSurvSim is an R package that provides a variety of methods for simulating survival (time-to-event) datasets. It is particularly useful for survival analysis applications in research and simulation studies. The package includes both non-parametric (kernel density estimation), parametric, and bootstrap-based simulation approaches for generating realistic time-to-event data.

Features


Installation

1. From Source

If you have downloaded or cloned this repository:

# Install devtools if you don't already have it
install.packages("devtools")

# Then, from the root of the package directory:
devtools::install_github()

Dependencies

This package uses several R libraries for density estimation, distribution fitting, and survival analysis. They will be automatically installed (if not already present) when installing RealSurvSim. Key dependencies include:


Usage

Below is an overview of the core functions and some example usages. For detailed information on parameters and return values, refer to the function documentation.

Core Functions

  1. data_simul_KDE(orig_vals, n = NULL, kernel = "gaussian")
    Simulates data via kernel density estimation from a numeric vector of original values.
  2. data_simul_Estim(orig_vals, n = NULL, distrib = "exp")
    Fits a specified parametric distribution to orig_vals and draws new samples from the fitted distribution.
  3. data_simul_bootstr(dat, n = NULL, type = "cond")
    Bootstrap-based simulation of event and censoring times.
  4. RealSurvSim(dat, col_time, col_status, col_group, reps = 10000, random_seed = 123, n = NULL, simul_type, distribs = c("exp", "exp", "exp", "exp"))
    The main wrapper function for simulating multiple survival datasets using one of four approaches:

Examples

Below are brief examples demonstrating how to simulate data. In practice, replace the placeholders (example_data, "time", etc.) with your actual dataset and column names.

library(RealSurvSim)

# Example dataset construction (for demonstration):
set.seed(123)
example_data <- data.frame(
  time = rexp(100, rate = 0.1),            # Times
  status = sample(0:1, 100, replace = TRUE), # 0=censored, 1=event
  group = sample(0:1, 100, replace = TRUE)   # Two groups, 0 or 1
)

# 1. Kernel Density Estimation Simulation
sim_kde <- RealSurvSim(
  dat = example_data,
  col_time   = "time",
  col_status = "status",
  col_group  = "group",
  reps       = 5,            # Simulate 5 datasets
  simul_type = "KDE"         # Use KDE-based simulation
)
str(sim_kde$datasets)  # Check the structure of generated datasets

# 2. Parametric Distribution Simulation
sim_distr <- RealSurvSim(
  dat = example_data,
  col_time   = "time",
  col_status = "status",
  col_group  = "group",
  reps       = 5,
  simul_type = "distr",
  distribs   = c("exp", "exp", "exp", "exp")
)
str(sim_distr$datasets)

# 3. Conditional Bootstrap
sim_cond <- RealSurvSim(
  dat = example_data,
  col_time   = "time",
  col_status = "status",
  col_group  = "group",
  reps       = 5,
  simul_type = "cond"
)
str(sim_cond$datasets)

# 4. Case Resampling
sim_case <- RealSurvSim(
  dat = example_data,
  col_time   = "time",
  col_status = "status",
  col_group  = "group",
  reps       = 5,
  simul_type = "case"
)
str(sim_case$datasets)

data(liang)
data(wu)
# 5. liang_kde<- RealSurvSim(liang, liang$V1, liang$V2, liang$V3, reps=3, simul_type = "KDE")

# For arbitary n
# 6. arbliang_distr<- RealSurvSim(liang,  liang$V1, liang$V2, liang$V3,reps=10,n = c(40,50), simul_type = "distr", distrib=c("exp", "llogis","llogis", "exp"))

# 7. arbwu_case<- RealSurvSim(wu, wu$V1, wu$V2, wu$V3, reps=100,n = c(40,50),  simul_type = "case")

References and Further Reading

Underlying Paper for the Package
Analysis and Methods for Survival Data (arXiv:2308.07842)

Data Reconstruction Algorithm
Guyot et al. (2012), describing the algorithm for reconstructing survival data from published Kaplan-Meier curves.

WebPlotDigitizer
WebPlotDigitizer for extracting data points from Kaplan-Meier curves.