--- title: "Cox Regression Under Quasi-Independent Double Truncation" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Cox Regression Under Quasi-Independent Double Truncation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 5, fig.height = 3, fig.align = "center" ) ``` ```{r setup} library(survdt) ``` ## Inverse probability weighted estimators The `survdt` package implements several methods for Cox regression with quasi-independent double truncation. We illustrate them through an application to the `aids` dataset, which holds records of AIDS incubation times from a retrospective sample of patients with transfusion-acquired HIV. The inclusion criteria were that the patient was diagnosed with AIDS between the discovery of the disease in 1982 and the end of the study in 1986, with time measured in months since HIV infection. The event time of interest is the AIDS incubation time `incu`, with the truncation times contained in `ltrunc` and `rtrunc` respectively. See `?survdt::aids` for more details. To fit a Cox model with nonparametric inverse probability weighting, use the `coxph_indtrunc()` function in the `survdt` package. The first input argument should be a model formula with a `Survdt` or `Survdt2` response. `Survdt()` is used under time-independent covariates (see `?survdt::Survdt`), while `Survdt2` is used when there are time-varying covariates (see [Time-varying covariates] for an example). ```{r} aids_fit <- coxph_indtrunc(Survdt(incu, ltrunc, rtrunc) ~ age_gp, data = aids) aids_fit confint(aids_fit) ``` The default setting applies robust inverse probability of non-truncation weights and fits the model by maximizing a weighted partial likelihood. Several choices of time-varying weights are available through the input argument `ipw_type`. They offer improved robustness to outliers and extreme inverse probability weights when compared to standard inverse probability weighting (`ipw_type="W-1"`). ```{r} coxph_indtrunc(Survdt(incu, ltrunc, rtrunc) ~ age_gp, data = aids, ipw_type = "W-1") ``` In this case, the coefficient estimates change under different weightings but the z-statistics are fairly similar. In [Sensitivity analysis for positivity violations], we find that different weight choices can have a greater impact in the `aids` data during sensitivity analysis. See `?survdt::coxph_indtrunc` for more details on the different weighting options. Lastly, stratified Cox models can be fit by specifying a `strata_formula`. See [Stratified Cox models] for an example. ### Plotting survival and hazard estimates Given a fitted Cox model, the hazard or survival function estimates can be plotted using the `plot_coxsurv()` function in the `survdt` package. The first input argument should be a model object from `coxph_indtrunc()`, while the second input argument should be a data frame with the covariate values to plot the conditional survival or hazard function for. ```{r, fig.height = 3.5} plot_coxsurv(aids_fit, newdata = aids[c(1,40), ], target = "survival") plot_coxsurv(aids_fit, newdata = aids[c(1, 40, 200), ], ci = FALSE, target = "cumhaz") ``` Note that `plot_coxsurv()` prints a message telling us it is computing the baseline hazard and its standard errors, since (by default) this was not done when fitting the model. Use `basehaz=TRUE` when running `coxph_indtrunc()` to remove the need to re-compute the baseline hazard for each plot. ### Assumption checking and model diagnostics The statistical methods implemented in `coxph_indtrunc()` require two key assumptions to ensure unbiased inference in the presence of double truncation. First, the truncation times should be independent of the event time and covariates within the observable data region, which is known as quasi-independence. Second, all possible event times should have a positive probability of being observed, i.e. not being truncated, which is known as the positivity assumption. The `survdt` package includes diagnostics for both of these assumptions. They are described in the following two sections. Residual diagnostics for the Cox model specification are available through the `residuals()` function (see `survdt::coxph_indtrunc`). #### Testing quasi-independent truncation with covariates The `survdt` package includes the function `test_quasiindep_covariates()` to test for violations of the quasi-independence assumption. This is accomplished through a two-stage test based on stratifying by covariate values. The first stage checks for dependence within strata, while the second stage checks across strata. The test can be repeated with different stratifications when there are many covariates involved. See `?survdt::test_quasiindep_covariates` for more details. The first input argument should be a model formula with a `Survdt` response and factor variables defining covariate strata on the right hand side. Non-factor variables will be automatically coerced to factors. Below we run the diagnostic test on the `aids` data, with age group defining the covariate strata. ```{r} qind_test <- test_quasiindep_covariates(Survdt(incu, ltrunc, rtrunc) ~ age_gp, data = aids) qind_test ``` Both stages of the test fail to find significant evidence of dependent truncation in the `aids` data, so the quasi-independence assumption required by `coxph_indtrunc()` is reasonably satisfied. The test results for the second stage can be plotted to visualize how strongly the truncation times could depend on the covariates. This is done by comparing the estimated selection (non-truncation) probabilities across strata. ```{r, fig.height = 3.5} plot(qind_test) ``` The estimated selection probabilities all lie within the 95% uniform confidence band around the unstratified estimates, as expected given the p-value from the stage 2 test printed above. #### Sensitivity analysis for positivity violations The `survdt` package includes the function `positivity_sens_indtrunc()` to conduct sensitivity analysis for the Cox model under potential violations of the positivity assumption. Given a sequence of potential values for the truncated baseline probability mass, it returns a corresponding sequence of adjusted coefficient estimates and associated sensitivity intervals which show how the fitted model could change due to non-positivity. See `?survdt::positivity_sens_indtrunc` for more details. The first input argument should be a model formula with `Survdt` response. The truncated masses are specified through `trunc_mass`, and should not be too close to one or the algorithm may not converge. Many of the other input arguments are shared with `coxph_indtrunc()`. First, we apply the sensitivity analysis with standard inverse probability weighting, which does not have a time-varying component. ```{r error=TRUE, warning=TRUE} positivity_sens_indtrunc(Survdt(incu, ltrunc, rtrunc) ~ age_gp, data = aids, ipw_type = "W-1", trunc_mass = seq(0, 0.11, by = 0.01)) ``` Note that the estimator failed to converge even at small truncated mass values. This can occur for many reasons (see `?survdt::positivity_sens_indtrunc` for more information). In the `aids` data, it turns out that the estimated selection probability for the largest observed incubation times approaches zero, leading to extreme inverse probability weights. Therefore we apply time-varying weights that reduce the influence of event time outliers and extreme weights by using the option `ipw_type = "W-asurv"`. This is the default weighting used in `positivity_sens_indtrunc()` and `coxph_indtrunc()`. ```{r} sens <- positivity_sens_indtrunc(Survdt(incu, ltrunc, rtrunc) ~ age_gp, data = aids, trunc_mass = seq(0, 0.11, by = 0.01)) sens ``` These robust weights improve the numerical convergence of the sensitivity analysis estimators by a substantial amount. The estimates can also be plotted to better visualize the results. ```{r} plot(sens) plot(sens, hazratio = TRUE) ``` Clearly the coefficient estimate for adults is highly sensitive to potential positivity violations, while the coefficient estimate for children is more stable. ### Stratified Cox models The `survdt` package supports stratified Cox models, where the baseline hazard is allowed to differ across strata. In the `coxph_indtrunc()` function, stratification factors are specified through the `strata_formula` input argument, which should be a one-sided formula with no response variable. We illustrate this with a simulated sample featuring non-proportional hazards, which is included in the `survdt` package as `nonprop_sample`. The data has a binary covariate `group` with a time-varying log hazard ratio of $0.7\times 1(t>0.6)$ and a continuous covariate `x` with a constant log hazard ratio of $-0.5$. We fit a model stratified by `group`. ```{r} strat_fit <- coxph_indtrunc(Survdt(event_time, ltrunc, rtrunc) ~ x, data = nonprop_sample, strata_formula = ~group) strat_fit confint(strat_fit) ``` Plotting the cumulative hazard estimates for the two groups at the same `x`, we see that they are indeed approximately equal until time 0.6, after which the hazard for group 1 is about $\exp(0.7)\approx 2$ times as large as the hazard for group 0. ```{r, fig.height=3.5} plot_coxsurv(strat_fit, newdata = data.frame(group = c(0, 1), x = c(1, 1)), target = "cumhaz") ``` ### Time-varying covariates The `survdt` package supports time-varying covariates in the `coxph_indtrunc()` function. We illustrate this with a simulated sample featuring non-proportional hazards, which is included in the `survdt` package as `nonprop_sample`. The data has a binary covariate `group` with a time-varying log hazard ratio of $0.7\times 1(t>0.6)$ and a continuous covariate `x` with a constant log hazard ratio of $-0.5$. We fit a model with the correct time-varying effect of `group`. First, we split each row in the data frame into two possible observation windows corresponding to $t\leq 0.6$ and $t>0.6$. The `survdt` package includes the function `time_split()` for this purpose. See `?survdt::time_split` for more details. ```{r} split_dat <- time_split(nonprop_sample, cut = 0.6, stop_name = "event_time", start = "start", event = "event", id = "id") ``` Then we add a column `group_tgr06` containing the time-varying covariate $1(t>0.6)\times$`group`, and fit the Cox model using a `Survdt2` response. ```{r} split_dat$group_tgr06 <- (split_dat$event_time > 0.6) * split_dat$group tvar_fit <- coxph_indtrunc(Survdt2(start, event_time, event, ltrunc, rtrunc, id) ~ group_tgr06 + x, data = split_dat) tvar_fit confint(tvar_fit) ``` The confidence intervals from the estimated Cox model contain the true values of 0.7 and -0.5.