--- title: "Binary Data MCP-Mod" output: rmarkdown::html_vignette bibliography: refs.bib link-citations: yes csl: american-statistical-association.csl vignette: > %\VignetteIndexEntry{Design and analysis template MCP-Mod for binary data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, child = "children/settings.txt"} ``` In this vignette we illustrate how to use the DoseFinding package with binary observations by fitting a first-stage GLM and applying the generalized MCP-Mod methodology to the resulting estimates. We also show how to deal with covariates. For continuously distributed data see [the corresponding vignette][v2]. [v2]: analysis_normal.html ## Background and data set Assume a dose-finding study is planned for an hypothetical investigational treatment in atopic dermatitis, for the binary endpoint Investigator's Global Assessment (IGA). The treatment is tested with doses 0, 0.5, 1.5, 2.5, 4. It is assumed the response rate for placebo will be around 10%, while the response rate for the top dose may be 35%. This is an example where the generalized MCP-Mod approach can be applied, i.e. dose-response testing and estimation will be performed on the logit scale. We generate some example data in the setting just described. The 10% placebo effect translates to -2.2 on the logit scale, and the asymptotic effect of 25 percentage points above placebo becomes `logit(0.35) - logit(0.1)`, approximately 1.6. ```{r, example_data} library(DoseFinding) library(ggplot2) logit <- function(p) log(p / (1 - p)) inv_logit <- function(y) 1 / (1 + exp(-y)) doses <- c(0, 0.5, 1.5, 2.5, 4) ## set seed and ensure reproducibility across R versions set.seed(1, kind = "Mersenne-Twister", sample.kind = "Rejection", normal.kind = "Inversion") group_size <- 100 dose_vector <- rep(doses, each = group_size) N <- length(dose_vector) ## generate covariates x1 <- rnorm(N, 0, 1) x2 <- factor(sample(c("A", "B"), N, replace = TRUE, prob = c(0.6, 0.4))) ## assume approximately logit(10%) placebo and logit(35%) asymptotic response with ED50=0.5 prob <- inv_logit(emax(dose_vector, -2.2, 1.6, 0.5) + 0.3 * x1 + 0.3 * (x2 == "B")) dat <- data.frame(y = rbinom(N, 1, prob), dose = dose_vector, x1 = x1, x2 = x2) ``` ## Candidate models We will use the following candidate set of models for the mean response on the logit scale: ```{r, setup, fig.width = 8, out.width = '100%'} mods <- Mods(emax = c(0.25, 1), sigEmax = rbind(c(1, 3), c(2.5, 4)), betaMod = c(1.1, 1.1), placEff = logit(0.1), maxEff = logit(0.35)-logit(0.1), doses = doses) plotMods(mods) ## plot candidate models on probability scale plotMods(mods, trafo = inv_logit) ``` ## Analysis without covariates First assume covariates had not been used in the analysis (not recommended in practice). Let $\mu_k$ denote the logit response probability at dose $k$, so that for patient $j$ in group $k$ we have \[ \begin{aligned} Y_{kj} &\sim \mathrm{Bernoulli}(p_{kj}) \\ \mathrm{logit}(p_{kj}) &= \mu_{k} \end{aligned} \] We perform the MCP-Mod test on the logit scale estimates $\hat\mu=(\hat\mu_1,\dots,\hat\mu_K)$ and their estimated covariance matrix $\hat S$. We can extract both from the object returned by the `glm()` call. ```{r, test_no_covariates} fit_nocov <- glm(y~factor(dose) + 0, data = dat, family = binomial) mu_hat <- coef(fit_nocov) S_hat <- vcov(fit_nocov) MCTtest(doses, mu_hat, S = S_hat, models = mods, type = "general") ``` Dose-response modeling then can proceed with a combination of bootstrapping and model averaging. For detailed explanations refer to the [vignette for analysis of continuous data][v2]. ```{r, estimate_no_covariates} one_bootstrap_prediction <- function(mu_hat, S_hat, doses, bounds, dose_seq) { sim <- drop(mvtnorm::rmvnorm(1, mu_hat, S_hat)) fit <- lapply(c("emax", "sigEmax", "betaMod"), function(mod) fitMod(doses, sim, model = mod, S = S_hat, type = "general", bnds = bounds[[mod]])) index <- which.min(sapply(fit, gAIC)) pred <- predict(fit[[index]], doseSeq = dose_seq, predType = "ls-means") return(pred) } ## bs_predictions is a doses x replications matrix, ## probs is a 4-element vector of increasing probabilities for the quantiles summarize_predictions <- function(bs_predictions, probs) { stopifnot(length(probs) == 4) med <- apply(bs_predictions, 1, median) quants <- apply(bs_predictions, 1, quantile, probs = probs) bs_df <- as.data.frame(cbind(med, t(quants))) names(bs_df) <- c("median", "low_out", "low_in", "high_in", "high_out") return(bs_df) } predict_and_plot <- function(mu_hat, S_hat, doses, dose_seq, n_rep) { bs_rep <- replicate( n_rep, one_bootstrap_prediction(mu_hat, S_hat, doses, defBnds(max(doses)), dose_seq)) bs_summary <- summarize_predictions(bs_rep, probs = c(0.025, 0.25, 0.75, 0.975)) bs_summary <- as.data.frame(inv_logit(bs_summary)) # back to probability scale ci_half_width <- qnorm(0.975) * sqrt(diag(S_hat)) glm_summary <- data.frame(dose = doses, mu_hat = inv_logit(mu_hat), low = inv_logit(mu_hat - ci_half_width), high = inv_logit(mu_hat + ci_half_width)) gg <- ggplot(cbind(bs_summary, dose_seq = dose_seq)) + geom_line(aes(dose_seq, median)) + geom_ribbon(aes(x = dose_seq, ymin = low_in, ymax = high_in), alpha = 0.2) + geom_ribbon(aes(x = dose_seq, ymin = low_out, ymax = high_out), alpha = 0.2) + geom_point(aes(dose, mu_hat), glm_summary) + geom_errorbar(aes(dose, ymin = low, ymax = high), glm_summary, width = 0, alpha = 0.5) + scale_y_continuous(breaks = seq(0, 1, 0.05)) + xlab("Dose") + ylab("Response Probability") + labs(title = "Bootstrap estimates for population response probability", subtitle = "confidence levels 50% and 95%") return(gg) } dose_seq <- seq(0, 4, length.out = 51) predict_and_plot(mu_hat, S_hat, doses, dose_seq, 1000) ``` ## Analysis with covariates In many situations there are important prognostic covariates (main effects) to adjust for in the analysis. Denote the vector of these additional covariates for patient $j$ with $x_{kj}$. \[ \begin{aligned} Y_{kj} &\sim \mathrm{Bernoulli}(p_{kj}) \\ \mathrm{logit}(p_{kj}) &= \mu_k^d + x_{kj}^T\beta \end{aligned} \] Fitting this model gives us estimated coefficients $\hat\mu=(\hat\mu^d, \hat\beta)$ and an estimate $\hat S$ of the covariance matrix of the estimator $\hat\mu$. In principle we could perform testing and estimation based on $\hat\mu^d$ and the corresponding sub-matrix of $\hat S$, but this would produce estimates for a patient with covariate vector $\beta=0$, and not reflect the overall population. To produce adjusted estimates per dose and to accommodate potential systematic differences in the covariates we predict the mean response probability at dose k for all observed values of the covariates and transform back to logit scale: \[ \mu^*_k := \mathrm{logit}\biggl(\frac{1}{n} \sum_{i=1}^n \mathrm{logit}^{-1}(\hat\mu^d_k + x_{i}^T\hat\beta)\biggr) \] Note here we index $x$ with $i$ that runs from 1 to $n$ (all patients randomized in the study). To obtain a variance estimate for $\mu^*$ we repeat this with draws from $\mathrm{MultivariateNormal}(\hat\mu, \hat S)$ and calculate the empirical covariance matrix $S^*$ of theses draws. Then we use $\mu^*$ and $S^*$ in `MCTtest()`. ```{r, test_covariates} fit_cov <- glm(y~factor(dose) + 0 + x1 + x2, data = dat, family = binomial) covariate_adjusted_estimates <- function(mu_hat, S_hat, formula_rhs, doses, other_covariates, n_sim) { ## predict every patient under *every* dose oc_rep <- as.data.frame(lapply(other_covariates, function(col) rep(col, times = length(doses)))) d_rep <- rep(doses, each = nrow(other_covariates)) pdat <- cbind(oc_rep, dose = d_rep) X <- model.matrix(formula_rhs, pdat) ## average on probability scale then backtransform to logit scale mu_star <- logit(tapply(inv_logit(X %*% mu_hat), pdat$dose, mean)) ## estimate covariance matrix of mu_star pred <- replicate(n_sim, logit(tapply(inv_logit(X %*% drop(mvtnorm::rmvnorm(1, mu_hat, S_hat))), pdat$dose, mean))) return(list(mu_star = as.numeric(mu_star), S_star = cov(t(pred)))) } ca <- covariate_adjusted_estimates(coef(fit_cov), vcov(fit_cov), ~factor(dose)+0+x1+x2, doses, dat[, c("x1", "x2")], 1000) MCTtest(doses, ca$mu_star, S = ca$S_star, type = "general", models = mods) ``` In the case at hand the results here are not dramatically different. Adjusting for covariates gives slightly lower variance estimates. ```{r, compare} ggplot(data.frame(dose = rep(doses, 4), est = c(inv_logit(mu_hat), diag(S_hat), inv_logit(ca$mu_star), diag(ca$S_star)), name = rep(rep(c("mean", "var"), each = length(doses)), times = 2), a = rep(c(FALSE, TRUE), each = 2*length(doses)))) + geom_point(aes(dose, est, color = a)) + scale_color_discrete(name = "adjusted") + facet_wrap(vars(name), scales = "free_y") + ylab("") ``` Dose-response modelling proceeds in the same way as before, but now on the adjusted estimates. ```{r, estimate_covariates} predict_and_plot(ca$mu_star, ca$S_star, doses, dose_seq, 1000) + labs(title = "Covariate adjusted bootstrap estimates for population response probability") ``` ## Avoiding problems with complete seperation and 0 responders In a number of situations it makes sense to replace ML estimation for logistic regression via `glm(..., family=binomial)`, with the Firth logistic regression [see @heinze2002], implemented as the `logistf` function from the `logistf` package. This is particularly important for small sample size per dose and if small number of responses are expected on some treatment arms. The estimator of Firth regression corresponds to the posterior mode in a Bayesian logistic regression model with Jeffrey's prior on the parameter vector. This estimator is well defined even in situations where the ML estimate for logistic regression does not exist (e.g. for complete separation). ## Considerations around optimal contrasts at design stage and analysis stage The formula for the optimal contrasts is given by \[ c^{\textrm{opt}} \propto S^{-1}\biggl(\mu^0_m - \frac{(\mu^0_m)^T S^{-1}1_K}{1_K^T S^{-1} 1_K}\biggr) \] where $\mu^0_m$ is the standardized mean response, $K$ is the number doses, and $1_K$ is an all-ones vector of length $K$ and $S$ is the covariance matrix of the estimates at the doses [see @pinheiro2014]. For calculating the optimal contrast for the generalized MCP step the covariance matrix $S$ of the estimator $\hat\mu$ can be re-estimated once the trial data are available. With normally distributed data this is possible with decent accuracy even at rather low sample sizes. In the case of binary data, $\hat\mu$ is on the logit scale and the diagonal elements of $S$ are approximately $(np(1-p))^{-1}$, where $n$ is the sample size of the dose group. This can be derived using the delta method. An estimate of this variance depends on the observed response rate and can thus be quite variable in particular for small sample sizes per group (e.g. smaller than 20). A crude alternative in these situations is to not use the estimated $S$ but a diagonal matrix with the inverse of the sample size per dose on the diagonal in the formula for calculation of the optimal contrast. The contrast calculated this way will asymptotically not be equal to the "optimal" contrast for the underlying model, but simulations show that they can be closer to the "true" optimal contrast (calculated based on the true variance per dose group) for small sample size, compared to the contrast calculated based on the estimated variance. To re-run the adjusted analysis above for the contrasts, calculated as outlined above, we need to calculate and hand-over the contrast matrix manually via `contMat` in the `MCTtest()` function. In our case (with 100 patients per group) we obtain a result that is only slightly different. ```{r} ## here we have balanced sample sizes across groups, so we select w = 1 ## otherwise would select w proportional to group sample sizes optCont <- optContr(mods, doses, w = 1) MCTtest(doses, ca$mu_star, S = ca$S_star, type = "general", contMat = optCont) ``` ## Power and sample size considerations We can calculate the power under each of the candidate models from the top of this vignette. For example, we assume a `Mods(emax = 0.25)` and calculate the vector of mean responses `lo` on the logit scale. When we transform it back to probability scale `p`, we can calculate the approximate variance of the (logit-scale) estimator `mu_hat` with the formula \[ \mathrm{Var}(\hat\mu) = \frac{1}{np(1-p)} \] (see the section above). Next we calculate the minimum power across the candidate set using `powMCT()` and plot it for increasing `n`. See also the [vignette on sample size calculation](sample_size.html). ```{r, sample_size} ## for simplicity: contrasts as discussed in the previous section contMat <- optContr(mods, w=1) ## we need each alternative model as a separate object alt_model_par <- list(emax = 0.25, emax = 1, sigEmax = c(1, 3), sigEmax = c(2.5, 4), betaMod = c(1.1, 1.1)) alt_common_par <- list(placEff = logit(0.1), maxEff = logit(0.35)-logit(0.1), doses = doses) ## this is a bit hackish because we need to pass named arguments to Mods() alt_mods <- lapply(seq_along(alt_model_par), function(i) { do.call(Mods, append(alt_model_par[i], alt_common_par)) }) prop_true_var_mu_hat <- lapply(seq_along(alt_model_par), function(i) { ## mean responses on logit scale lo <- getResp(do.call(Mods, append(alt_model_par[i], alt_common_par))) p <- inv_logit(lo) # mean responses on probability scale v <- 1 / (p * (1-p)) # element-wise variance of mu_hat up to a factor of 1/n return(as.numeric(v)) # drop unnecessary attributes }) min_power_at_group_size <- function(n) { pwr <- mapply(function(m, v) powMCT(contMat, alpha=0.025, altModels=m, S=diag(v/n), df=Inf), alt_mods, prop_true_var_mu_hat) return(min(pwr)) } n <- seq(5, 80, by=5) pwrs <- sapply(n, min_power_at_group_size) qplot(n, pwrs, geom="line", ylab="Min. Power over candidate set")+ scale_y_continuous(breaks = seq(0,1,by=0.1), limits = c(0,1)) ``` ## References