---
title: "survkl: Transfer-Learning Based Integrated Cox Models"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{survkl: Transfer-Learning Based Integrated Cox Models}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options:
  markdown:
    wrap: sentence
---

```{r, include=FALSE}
knitr::opts_chunk$set(fig.alt = "Plot generated in survkl vignette")
```


## Introduction

Accurate prognosis prediction plays a central role in survival analysis.
The recent growth of large-scale biobank studies—characterized by extensive genetic, transcriptomic, imaging, and electronic health record data—has created new opportunities to enhance prediction accuracy.
However, despite their potential, these datasets often suffer from limited effective sample sizes, high dimensionality, low signal-to-noise ratios, and practical constraints such as patient privacy.

Integrating external information is a promising strategy for improving model performance, yet classical data integration methods typically assume homogeneous data-generating mechanisms across sources.
This assumption rarely holds in modern biomedical applications.
Ignoring heterogeneity can introduce substantial bias, underscoring the need for transfer-learning approaches that explicitly account for discrepancies across populations.

While KL divergence has been used previously for integrating binary-outcome models, extending such ideas to censored time-to-event outcomes is nontrivial due to censoring and the limited nature of available external survival information (e.g., predicted risk scores or sets of coefficients without a baseline hazard).

The `survkl` package addresses these challenges by offering a transfer-learning framework for Cox models that integrates external information with newly collected survival data.
It is designed to be flexible, privacy-preserving, and effective across both low- and high-dimensional settings.
Key features include:

### **Key Features**

- **Transfer Learning with KL Integration:**  
  Incorporates various types of external information via KL-based penalization to adaptively borrow strength.

- **Privacy-Preserving External Use:**  
  Requires only summary-level external inputs—such as risk scores or coefficient estimates—without access to individual-level external data.

- **Robustness to Heterogeneity:**  
  Explicitly accounts for population differences through adaptive weighting controlled by tuning parameters (`eta`), enabling selective borrowing from compatible sources.

- **High-Dimensional Modeling Support:**  
  Implements regularized KL-integrated Cox models, including  
  - Ridge (`coxkl_ridge`)  
  - Elastic Net (`coxkl_enet`, including LASSO as a special case)  

- **Flexible External Information Formats:**  
  Supports external coefficients, risk scores, and partially overlapping predictor sets.

- **Comprehensive Cross-Validation Framework:**  
  Provides multiple tuning criteria including  
  - *Two based on C-index*  
  - *Two based on Loss*: predicted deviance and *V&VH* loss.

This vignette introduces the main functionalities of `survkl` and provides examples for both low- and high-dimensional modeling workflows.

## Installation

You can install from CRAN:

```{r, eval=FALSE}
install.packages("survkl")
```

Or install the development version of `survkl` from GitHub:

```{r, eval=FALSE}
require(devtools)
require(remotes)
remotes::install_github("UM-KevinHe/survkl", ref = "main")
```

## Quick Start

This section provides a brief overview of the main functions using example datasets included in the package.

First, load the package:


```{r}
library(survkl)
```

### Low-Dimensional Integration

The low-dimensional KL-integrated Cox model, implemented in `coxkl`, is designed for settings where the number of predictors is modest.
The method incorporates external information—either in the form of external Cox coefficients (`beta`) or pre-computed external risk scores (`RS`)—using a KL-based penalization.
The tuning parameter `eta` governs the degree of borrowing:  
- `eta = 0` reproduces the standard Cox model;  
- larger `eta` values increasingly pull the fitted coefficients toward the external information.

In this section we demonstrate how to fit a KL-integrated Cox model and how to select the optimal `eta` using cross-validation via `cv.coxkl`.

#### Example data structure

We illustrate the workflow using the built-in low-dimensional simulated dataset:

```{r}
data(ExampleData_lowdim)

train  <- ExampleData_lowdim$train
test   <- ExampleData_lowdim$test

z      <- train$z
delta  <- train$status
time   <- train$time
strat  <- train$stratum
```

and externally derived coefficients beta_external:
```{r}
beta_ext <- ExampleData_lowdim$beta_external_good
```

We generate a sequence of eta values through the internal utility `generate_eta()`, and fit the KL-integrated model across this grid:

```{r}
eta_grid <- generate_eta(method = "exponential", n = 100, max_eta = 30)
fit_lowdim <- coxkl(
  z = z,
  delta = delta,
  time = time,
  stratum = strat,
  beta = beta_ext,
  etas = eta_grid
)
```

The S3 method `coef()` provides a clean interface to extract estimated coefficients. If the requested `eta` is between fitted values, linear interpolation is performed:

```{r}
coef(fit_lowdim, eta = 1)
```

Users may instead supply an external risk score vector:

```{r}
RS_ext <- as.matrix(z) %*% as.matrix(beta_ext)

fit_lowdim_RS <- coxkl(
  z = z,
  delta = delta,
  time = time,
  stratum = strat,
  RS = RS_ext,
  etas = eta_grid
)

coef(fit_lowdim_RS)[1:5]
```

Objects of class `coxkl` can be visualized using the S3 plotting method `plot()`.  
This function displays how model performance changes across the `eta`–sequence
used during fitting.

Two types of performance criteria are supported:

- `"loss"`  
  (default; −2 × partial log-likelihood, normalized by sample size)

- `"CIndex"`  
  (stratified concordance index)

If no test data are supplied, performance is computed using the training data
stored in `object$data`:

```{r}
plot(
  fit_lowdim,
  test_z       = test$z,
  test_time    = test$time,
  test_delta   = test$status,
  test_stratum = test$stratum,
  criteria     = "loss"
)    
```

The function `cv.coxkl` performs K-fold (default 5) cross-validation to choose the integration parameter.
It supports four criteria:

- `"V&VH"` — V&VH loss  
- `"LinPred"` — predicted partial deviance  
- `"CIndex_pooled"` — pooled comparable pairs  
- `"CIndex_foldaverage"` — per-fold stratified C-index  

Below is an example using the default `"V&VH"` criterion:

```{r}
cv_lowdim <- cv.coxkl(
  z        = z,
  delta    = delta,
  time     = time,
  stratum  = strat,
  beta     = beta_ext,
  etas     = eta_grid,
  nfolds   = 5,
  criteria = "V&VH",
  seed = 1)
```

The cross-validated performance curve can be visualized using `cv.plot()`:

```{r}
cv.plot(cv_lowdim)
```

The solid purple curve displays the cross-validated loss across different values of `eta`.
The green dotted horizontal line marks the internal baseline at `eta` = 0, representing the model that does not incorporate external information.
The vertical dashed orange line indicates the optimal `eta` value, where the cross-validated loss is minimized.

A comparison between the purple curve and the green baseline shows whether borrowing external information improves prediction performance.
Whenever the purple curve falls below the green line, using external information (`eta` > 0) yields better predictive accuracy than relying solely on the internal model.

### High-Dimensional Integration (Ridge, Elastic Net, and LASSO)

In high-dimensional settings (for example, when the number of predictors is comparable to or exceeds the sample size),
the `survkl` package extends KL-integrated Cox modeling with regularization. Two families of penalties are supported:

- Ridge penalty (L2), implemented in `coxkl_ridge`, which shrinks all coefficients toward zero while retaining dense solutions.
- Elastic net penalty (a mixture of L1 and L2), implemented in `coxkl_enet`, which includes the LASSO (pure L1) as a special case when the mixing parameter is set to 1.

Both models combine:

- a KL term, controlled by the integration parameter `eta`, to borrow information from external sources (risk scores `RS` or coefficients `beta`), and  
- a regularization term (ridge or elastic net) to stabilize estimation in high-dimensional designs.

In this section we first introduce the shared high-dimensional example dataset, and then illustrate the usage of
the ridge-penalized model `coxkl_ridge`. (The elastic net workflows are analogous and will be discussed in a separate subsection.)

#### High-dimensional example data

We use the built-in high-dimensional simulated dataset:


```{r}
data(ExampleData_highdim)

train_hd  <- ExampleData_highdim$train
test_hd   <- ExampleData_highdim$test

z_hd      <- train_hd$z
delta_hd  <- train_hd$status
time_hd   <- train_hd$time
strat_hd  <- train_hd$stratum
```


This dataset contains 50 predictors (`Z1`–`Z50`) with 6 signal variables and 44 AR(1) noise variables.
Externally derived coefficients are provided in `beta_external`:

```{r}
beta_external_hd <- ExampleData_highdim$beta_external
```

These external coefficients are estimated from a separate dataset using only `Z1`–`Z6` and then expanded to a length-50 vector,
with zeros for `Z7`–`Z50`.

#### Ridge-Penalized KL-Integrated Cox Model (`coxkl_ridge`)

The function `coxkl_ridge` fits a KL-integrated Cox model with a ridge (L2) penalty on all predictors.
External information is incorporated through a KL term weighted by `eta` (an user-specified scalar), while the ridge penalty is controlled by a
sequence of tuning parameters `lambda`. If `lambda` is not provided, a decreasing lambda path is generated automatically.

We first fit a KL–ridge model for a fixed integration weight `eta` and an automatically generated lambda path:

```{r}
model_ridge <- coxkl_ridge(
  z        = z_hd,
  delta    = delta_hd,
  time     = time_hd,
  stratum  = strat_hd,
  beta     = beta_external_hd,   # external coefficients (length 50)
  eta      = 1                  # KL integration weight
)
```
The fitted object stores, for each lambda value:

- `model_ridge$lambda` — the lambda sequence (in decreasing order),
- `model_ridge$beta` — estimated coefficients (one column per lambda),
- `model_ridge$linear.predictors` — linear predictors for all observations and all lambda values,
- `model_ridge$likelihood` — partial log-likelihood along the lambda path,
- `model_ridge$data` — the data used for fitting.

The S3 method `coef()` extracts the estimated coefficients:

```{r}
# All lambdas (columns ordered in decreasing lambda)
coef(model_ridge)[1:5, 1:5]  # first 5 lambdas
```
To focus on a specific value of `lambda`:

```{r}
lambda_target <- model_ridge$lambda[5]
coef(model_ridge, lambda = lambda_target)[1:5]
```

If the requested lambda is not exactly one of the fitted values, `coef()` performs linear interpolation along the lambda path.

Objects of class `coxkl_ridge` can be visualized using the S3 plotting method `plot()`:

By default, this plots (at given `eta`):

- Loss (`-2 * partial log-likelihood`) versus the penalty parameter `lambda`,
- x-axis on a reversed log10 scale (larger penalties on the left, smaller penalties on the right),
- y-axis labeled as “Loss”.
- A vertical dashed orange line marks the optimal value of λ, where the loss reaches its minimum on the evaluated grid.


```{r}
plot(
  model_ridge,
  test_z       = test_hd$z,
  test_time    = test_hd$time,
  test_delta   = test_hd$status,
  test_stratum = test_hd$stratum,
  criteria     = "CIndex"
)
```

The function `cv.coxkl_ridge` performs K-fold cross-validation to tune the integration parameter `eta`
while internally scanning over a `lambda` path for each candidate `eta`. For each `eta`, it:

- evaluates a sequence of ridge penalties `lambda`,
- computes the chosen cross-validation criterion on held-out folds,
- selects the best `lambda` for that `eta`,
- aggregates the results into summary tables.

Supported criteria are:

- `"V&VH"` — V&VH loss (reported as `Loss = -2 * score`),
- `"LinPred"` — predicted partial deviance,
- `"CIndex_pooled"` — pooled C-index over all folds,
- `"CIndex_foldaverage"` — average C-index across folds.

Example: tuning `eta` using 5-fold cross-validation and the `"V&VH"` criterion:

```{r}
eta_grid_hd <- generate_eta(method = "exponential", n = 50, max_eta = 100)

cv_ridge_hd <- cv.coxkl_ridge(
  z           = z_hd,
  delta       = delta_hd,
  time        = time_hd,
  stratum     = strat_hd,
  beta        = beta_external_hd,
  etas        = eta_grid_hd,
  nfolds      = 5,
  cv.criteria = "V&VH",
  seed = 1)
```


The best `lambda` for each `eta` (according to the chosen criterion) is provided by:

```{r}
cv_ridge_hd$integrated_stat.best_per_eta
```


As with low-dimensional models, the helper function `cv.plot()` can be used to visualize performance versus `eta`:

```{r}
cv.plot(cv_ridge_hd)
```

The plot shows:

- a purple curve for the cross-validated performance across the `eta` sequence (loss or C-index),
- a green dotted horizontal line indicating the internal baseline at `eta = 0`,
- a green point marking the baseline value,
- and a vertical dashed orange line indicating the optimal η, where the cross-validated loss reaches its minimum.


#### Elastic-Net / LASSO KL-Integrated Cox Model (`coxkl_enet`)

The function `coxkl_enet` fits a KL-integrated Cox model with an elastic-net penalty, controlled by the
mixing parameter `alpha`. When `alpha = 1`, the penalty reduces to *LASSO*, enabling coefficient sparsity in addition to KL-based integration of external information.

External knowledge may be incorporated either through external coefficients (`beta`) or an
externally computed risk score (`RS`). The integration weight `eta` determines how strongly the model
borrows from this external signal, while the penalty parameter `lambda` controls the sparsity level.
If `lambda` is not supplied, the function automatically generates a decreasing lambda sequence.

We illustrate the workflow using *LASSO* (`alpha = 1`) with an automatically generated lambda path:

```{r}
model_enet <- coxkl_enet(
  z        = z_hd,
  delta    = delta_hd,
  time     = time_hd,
  stratum  = strat_hd,
  beta     = beta_external_hd,   
  eta      = 1,                 
  alpha    = 1                  # LASSO penalty
)
```

The fitted object stores, for each lambda value:

- `model_enet$lambda` — the lambda sequence (in decreasing order),
- `model_enet$beta` — estimated coefficients (one column per lambda),
- `model_enet$likelihood` — partial log-likelihood along the lambda path,
- `model_enet$data` — the data used for fitting.

The S3 method `coef()` extracts the estimated coefficients:

```{r}
coef(model_enet)[1:5, 1:5]
```

To extract coefficients corresponding to a specific `lambda`:

```{r}
lambda_target <- model_enet$lambda[5]
coef(model_enet, lambda = lambda_target)[1:5]
```
Objects of class `coxkl_enet` can be visualized using the S3 method `plot()`, and default is to plot loss versus `lambda`:


```{r}
plot(
  model_enet,
  test_z = test_hd$z,
  test_time = test_hd$time,
  test_delta = test_hd$status,
  test_stratum = test_hd$stratum,
  criteria = "loss"
)
```

Similar, the function `cv.coxkl_enet` extends the above fitting procedure by performing K-fold cross-validation over a supplied grid of eta values:
```{r}
eta_grid_hd <- generate_eta(method = "exponential",
                            n = 50,
                            max_eta = 100)

cv_enet_hd <- cv.coxkl_enet(
  z = z_hd,
  delta = delta_hd,
  time = time_hd,
  stratum = strat_hd,
  beta = beta_external_hd,
  etas = eta_grid_hd,
  alpha = 1, # LASSO
  nfolds = 5,
  cv.criteria = "V&VH",
  seed = 1
)
```
CV results can be visualized using `cv.plot()`:

```{r}
cv.plot(cv_enet_hd)
```