--- title: "Getting Started with simevent" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{my-vignette} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(simevent) ``` # Introduction The goal of `simevent` is to provide functions for the generation and analysis of complex continuous time health care data. The simulation functions are quite general and many various settings can be specified. In general data is simulated for \(N\) individuals, the simulated data can include variables such as treatment decisions, disease progression, and health factors. Currently the package contains 1 general function `simEventData`, and 5 special case functions that simulate data from e.g the survival setting or the competing risk setting using the underlying `simEventData` function. This document is structured as follows: first the general simulation setting and simulation procedure is explained. This section is quite mathematical and seeks to provide an in depth explanation of the simulation procedure. Next the specific simulation functions and the use of these are treated. # The General Simulation Setting The simulation approach is similar to Eimermacher(https://doi.org/10.1186/s12874-015-0005-2), and is based on a counting process framework. ## The Mathematics We consider a scenario where \(n\) individuals are followed over a time interval $\mathcal{T} = [0, \rho)$, where $\rho \in (0,\infty]$. Each individual can experience $J \geq 1$ events. Let \( \mathcal{X} =\lbrace 0, 1, \ldots , J \rbrace \) represent the various types of events. We often let $x = 0$ denote censoring. Events are not necessarily exclusive and can affect one another. We let \(N^x\) represent the counting process associated to the event \(x \in \mathcal{X}\). We collect all the counting processes in a \(J+1\) dimensional vector \(N\). The filtration \((\mathcal{F}_t)_{t \in \mathcal{T}}\) is generated by the variables up to time \(t\), and represents the information available at this time point. The filtration \(\mathcal{F}_{t-}\) represents the information available just before the time \(t\). We assume the intensity of the counting processes takes the form, $$\lambda^x(t\, \vert \, \mathcal{F}_{t-}) = R^x(t\, \vert \, \mathcal{F}_{t-}) \eta^x \nu^x t^{\nu^x - 1} \phi^x(t \, \vert \, \mathcal{F}_{t-})$$ where \(\eta^x > 0\) is a scale parameter and \(\nu^x > 0\) is a shape parameter. This is the intensity corresponding to a Weibull distributed random variable. When \(\nu^x >1\), the intensity increases over time, and when \(\nu^x <1\), the intensity decreases. The function \(R^x(t \, \vert \, \mathcal{F}_{t-})\) is an (\(x\)-specific) at-risk indicator that determines when an individual is at risk for event \(x\). The function \(\phi^x(t \, \vert \, \mathcal{F}_{t-})\) models how the intensity depends on past information. We consider functions of the form: $$\phi^x(t \, \vert \, \mathcal{F}_{t-}) =\exp (L^{\top}\beta_1 + N(t-)^{\top} \beta_2)$$ Where $\beta_1 \in \mathbb{R}^d$, $\beta_2 \in \mathbb{R}^{J+1}$ and $L \in \mathbb{R}^d$ is a vector of baseline covariates. The simulations are made by deriving the distribution of the waiting times, the times in between events, and these follow directly from the specification of the intensities of the originnal counting processes. ## The simEventData Function