Type: Package
Title: Fitting Bayesian Marginal Structural Models for Longitudinal Observational Data
Version: 1.0.0
Maintainer: Kuan Liu <kuan.liu@utoronto.ca>
Description: Implements Bayesian marginal structural models for causal effect estimation with time-varying treatment and confounding. It includes an extension to handle informative right censoring. The Bayesian importance sampling weights are estimated using JAGS. See Saarela (2015) <doi:10.1111/biom.12269> for methodological details.
License: MIT + file LICENSE
URL: https://github.com/Kuan-Liu-Lab/bayesmsm
BugReports: https://github.com/Kuan-Liu-Lab/bayesmsm/issues
Depends: R (≥ 4.2.0)
Imports: coda (≥ 0.19-4), doParallel, foreach, ggplot2, graphics, grDevices, MCMCpack, parallel, R2jags, stats
Suggests: devtools, knitr, rmarkdown, testthat (≥ 3.0.0)
VignetteBuilder: knitr
Config/testthat/edition: 3
Encoding: UTF-8
NeedsCompilation: no
RoxygenNote: 7.3.2
Packaged: 2025-06-13 22:12:39 UTC; kuan9
Author: Kuan Liu ORCID iD [aut, cre, cph], Xiao Yan ORCID iD [aut], Martin Urner ORCID iD [aut]
Repository: CRAN
Date/Publication: 2025-06-17 06:00:02 UTC

Bayesian Marginal Structural Model Bootstrap Estimation

Description

This function performs Bayesian non-parametric bootstrap to estimate causal effects in Bayesian marginal structural models. It supports both continuous (Gaussian) and binary (binomial) outcome variables

Usage

bayesmsm(
  ymodel,
  nvisit,
  reference = c(rep(0, nvisit)),
  comparator = c(rep(1, nvisit)),
  treatment_effect_type = "sq",
  family = "gaussian",
  data,
  wmean = rep(1, nrow(data)),
  nboot = 1000,
  optim_method = "BFGS",
  seed = NULL,
  parallel = TRUE,
  ncore = 4
)

Arguments

ymodel

Model statement for the outcome variable.

nvisit

Number of visits or time points to simulate.

reference

Vector denoting the intervention to be used as the reference across all visits for calculating the risk ratio and risk difference. The default is a vector of all 0's with length nvisit (i.e. never treated).

comparator

Vector denoting the intervention to be used as the comparator across all visits for calculating the risk ratio and risk difference. The default is a vector of all 1's with length nvisit (i.e. always treated).

treatment_effect_type

Character string specifying the type of treatment effect to estimate. Options are "sq" for sequential treatment effects, which estimates effects for specific treatment sequences across visits, and "cum" for cumulative treatment effects, which assumes a single cumulative treatment variable representing the total exposure. The default is "sq".

family

Character string specifying the outcome distribution family. The possible distributions are: "Gaussian" (default) for continuous outcomes, and "binomial" for binary outcomes.

data

Data table containing the variable names in the outcome model.

wmean

Vector of treatment assignment weights. The default is rep(1, nrow(data)).

nboot

Integer specifying the number of bootstrap iterations. The default is 1000.

optim_method

Character string specifying the optimization method to be used. The default is "BFGS".

seed

Starting seed for simulations and bootstrapping. The default is NULL.

parallel

Logical scalar indicating whether to parallelize bootstrapping to multiple cores. The default is TRUE.

ncore

Integer specifying the number of CPU cores to use in parallel simulation. This argument is required when parallel is set to TRUE, and the default is 4.

Value

It returns an object of class "bayesmsm" that contains the information about the data, model, etc. An object of class "bayesmsm" is a list containing at least the following components: "mean", the mean of the bootstrap estimates; "sd", the standard deviation of the bootstrap estimates; "quantile", the 95% quantiles of the bootstrap estimates; "bootdata", a data frame of bootstrapped estimates; "reference", the reference intervention level and "comparator", the comparison intervention level

Examples

# 1) Specify simple treatment‐assignment models
amodel <- list(
  c("(Intercept)" =  0, "L1_1" =  0.5, "L2_1" = -0.5),
  c("(Intercept)" =  0, "L1_2" =  0.5, "L2_2" = -0.5, "A_prev" = 0.3)
)
# 2) Specify a continuous‐outcome model
ymodel <- c("(Intercept)" = 0,
            "A1"         = 0.2,
            "A2"         = 0.3,
            "L1_2"       = 0.1,
            "L2_2"       = -0.1)
# 3) Simulate without right‐censoring
testdata <- simData(
  n                = 200,
  n_visits         = 2,
  covariate_counts = c(2, 2),
  amodel           = amodel,
  ymodel           = ymodel,
  y_type           = "continuous",
  right_censor     = FALSE,
  seed             = 123)
model <- bayesmsm(ymodel = Y ~ A1 + A2,
                  nvisit = 2,
                  reference = c(rep(0,2)),
                  comparator = c(rep(1,2)),
                  treatment_effect_type = "sq",
                  family = "binomial",
                  data = testdata,
                  wmean = rep(1,200),
                  nboot = 10,
                  optim_method = "BFGS",
                  seed = 890123,
                  parallel = FALSE)

Bayesian Treatment Effect Weight Estimation Using JAGS

Description

This function estimates Bayesian importance sampling weights for time-varying treatment effects using specified models for each treatment time point via JAGS

Usage

bayesweight(
  trtmodel.list,
  data,
  n.chains = 2,
  n.iter = 25000,
  n.burnin = 15000,
  n.thin = 5,
  seed = NULL,
  parallel = TRUE
)

Arguments

trtmodel.list

A list of formulas corresponding to each time point with the time-specific treatment variable on the left hand side and pre-treatment covariates to be balanced on the right hand side. The formulas must be in temporal order, and must contain all covariates to be balanced at that time point. Interactions and functions of covariates are allowed.

data

A data set in the form of a data frame containing the variables in "trtmodel.list". This must be a wide data set with exactly one row per unit.

n.chains

Integer specifying the number of MCMC chains to run. Set to 1 for non-parallel computation. For parallel computation, it is required to use at least 2 chains. The default is 2.

n.iter

Integer specifying the total number of iterations for each chain (including burn-in). The default is 25000.

n.burnin

Integer specifying the number of burn-in iterations for each chain. The default is 15000.

n.thin

Integer specifying the thinning rate for the MCMC sampler. The default is 5.

seed

Starting seed for the JAGS model. The default is NULL.

parallel

Logical scalar indicating whether to run the MCMC chains in parallel. The default is TRUE.

Value

A list of the calculated weights and the JAGS model, where 'weights' is a vector of posterior mean weights, computed by taking the average of the weights across all MCMC iterations and 'model_string' is a character of the JAGS model based on the input of 'trtmodel.list'.

Examples

# 1) Specify simple treatment‐assignment models
amodel <- list(
  c("(Intercept)" =  0, "L1_1" =  0.5, "L2_1" = -0.5),
  c("(Intercept)" =  0, "L1_2" =  0.5, "L2_2" = -0.5, "A_prev" = 0.3)
)
# 2) Specify a continuous‐outcome model
ymodel <- c("(Intercept)" = 0,
            "A1"         = 0.2,
            "A2"         = 0.3,
            "L1_2"       = 0.1,
            "L2_2"       = -0.1)
# 3) Simulate without right‐censoring
testdata <- simData(
  n                = 200,
  n_visits         = 2,
  covariate_counts = c(2, 2),
  amodel           = amodel,
  ymodel           = ymodel,
  y_type           = "continuous",
  right_censor     = FALSE,
  seed             = 123)
weights <- bayesweight(trtmodel.list = list(
                       A1 ~ L1_1 + L2_1,
                       A2 ~ L2_2 + L2_2 + A1),
                       data = testdata,
                       n.chains = 1,
                       n.iter = 20,
                       n.burnin = 10,
                       n.thin = 1,
                       seed = 890123,
                       parallel = FALSE)
summary(weights)

Bayesian Treatment Effect Weight Estimation for Censored Data

Description

This function estimates Bayesian importance sampling weights for treatment models and censoring models across multiple time points via JAGS

Usage

bayesweight_cen(
  trtmodel.list,
  cenmodel.list,
  data,
  n.chains = 2,
  n.iter = 25000,
  n.burnin = 15000,
  n.thin = 5,
  seed = NULL,
  parallel = TRUE
)

Arguments

trtmodel.list

A list of formulas corresponding to each time point with the time-specific treatment variable on the left-hand side and pre-treatment covariates to be balanced on the right-hand side. The formulas must be in temporal order, and must contain all covariates to be balanced at that time point. Interactions and functions of covariates are allowed.

cenmodel.list

A list of formulas for the censored data at each time point, with censoring indicators on the left-hand side and covariates on the right-hand side. The formulas must be in temporal order, and must contain all covariates to be balanced at that time point.

data

A data set in the form of a data frame containing the variables in "trtmodel.list" and "cenmodel.list". This must be a wide data set with exactly one row per unit.

n.chains

Integer specifying the number of MCMC chains to run. Set to 1 for non-parallel computation. For parallel computation, it is required to use at least 2 chains. The default is 2.

n.iter

Integer specifying the total number of iterations for each chain (including burn-in). The default is 25000.

n.burnin

Integer specifying the number of burn-in iterations for each chain. The default is 15000.

n.thin

Integer specifying the thinning rate for the MCMC sampler. The default is 5.

seed

Starting seed for the JAGS model. The default is NULL.

parallel

Logical scalar indicating whether to run the MCMC chains in parallel. The default is TRUE.

Value

A list of the calculated weights and the JAGS model where "weights" is a vector of posterior mean weights, computed by taking the average of the weights across all MCMC iterations and 'model_string' is a character of the JAGS model based on the input of "trtmodel.list".

Examples

amodel <- list(
 c("(Intercept)" = -0.3, "L1_1" = 0.4, "L2_1" = -0.2),
 c("(Intercept)" = -0.1, "L1_2" = 0.3, "L2_2" = -0.1, "A_prev" = 0.5))
ymodel <- c(
"(Intercept)" = -0.8,
"A1"          = 0.2,
"A2"          = 0.4,
"L1_2"        = 0.3,
"L2_2"        = -0.3)
cmodel <- list(
 c("(Intercept)" = -1.5, "L1_1" = 0.2, "L2_1" = -0.2, "A" = 0.2),
 c("(Intercept)" = -1.5, "L1_2" = 0.1, "L2_2" = -0.1, "A" = 0.3))
testdata <- simData(
 n                = 50,
 n_visits         = 2,
 covariate_counts = c(2, 2),
 amodel           = amodel,
 ymodel           = ymodel,
 y_type           = "binary",
 right_censor     = TRUE,
 cmodel           = cmodel,
 seed             = 123
)
weights_cen <- bayesweight_cen(
                trtmodel.list = list(
                       A1 ~ L1_1 + L2_1,
                       A2 ~ L2_2 + L2_2 + A1),
                cenmodel.list = list(
                C1 ~ L1_1 + L2_1 + A1,
                C2 ~ L1_2 + L2_2 + A2),
                data = testdata,
                n.chains = 1,
                n.iter = 20,
                n.burnin = 10,
                n.thin = 1,
                seed = 890123,
                parallel = FALSE)
summary(weights_cen)

This function to calculate the causal effect of an intervention given the parameter estimates and intervention levels

Description

This function to calculate the causal effect of an intervention given the parameter estimates and intervention levels

Usage

calculate_effect(
  intervention_levels,
  variables,
  param_estimates,
  treatment_effect_type
)

Arguments

intervention_levels

A numeric vector indicating the levels of intervention for each predictor variable.

variables

A list of the names of the response variable and predictor variables extracted from the model.

param_estimates

A vector of parameter estimates from the model.

treatment_effect_type

Character string specifying the type of treatment effect to estimate. Options are "sq" for sequential treatment effects, which estimates effects for specific treatment sequences across visits, and "cum" for cumulative treatment effects, which assumes a single cumulative treatment variable representing the total exposure. The default is "sq".

Value

A numeric value representing the calculated effect of the specified intervention.


Plot Average Potential Outcomes (APO)

Description

This function plots the density of APO for a specified effect type from bayesmsm output.

Usage

plot_APO(input, effect_type, ...)

Arguments

input

A data frame or model object containing bootstrap results.

effect_type

A character string specifying which effect to plot (e.g., comparator or reference treatment sequences).

...

Additional arguments passed to the plotting function.

Value

A ggplot object representing density plot showing the distribution of the specified average potential outcome (reference or comparison).

Examples

# 1) Specify simple treatment‐assignment models
amodel <- list(
  c("(Intercept)" =  0, "L1_1" =  0.5, "L2_1" = -0.5),
  c("(Intercept)" =  0, "L1_2" =  0.5, "L2_2" = -0.5, "A_prev" = 0.3)
)
# 2) Specify a continuous‐outcome model
ymodel <- c("(Intercept)" = 0,
            "A1"         = 0.2,
            "A2"         = 0.3,
            "L1_2"       = 0.1,
            "L2_2"       = -0.1)
# 3) Simulate without right‐censoring
testdata <- simData(
  n                = 200,
  n_visits         = 2,
  covariate_counts = c(2, 2),
  amodel           = amodel,
  ymodel           = ymodel,
  y_type           = "continuous",
  right_censor     = FALSE,
  seed             = 123)
model <- bayesmsm(ymodel = Y ~ A1 + A2,
                  nvisit = 2,
                  reference = c(rep(0,2)),
                  comparator = c(rep(1,2)),
                  treatment_effect_type = "sq",
                  family = "binomial",
                  data = testdata,
                  wmean = rep(1,200),
                  nboot = 10,
                  optim_method = "BFGS",
                  seed = 890123,
                  parallel = FALSE)
plot_APO(model$bootdata, effect_type = "effect_comparator")
plot_APO(model, effect_type = "effect_reference")

Plot Average Treatment Effect Density from bayesmsm output

Description

This function plots the density of ATE from bayesmsm output.

Usage

plot_ATE(
  input,
  ATE = "RD",
  col_density = "blue",
  fill_density = "lightblue",
  main = "Posterior Predictive Distribution of Average Treatment Effect",
  xlab = "ATE",
  ylab = "Posterior Predictive Distribution",
  xlim = NULL,
  ylim = NULL,
  ...
)

Arguments

input

A model object, data frame or vector containing the bootstrap estimates of ATE.

ATE

define causal estimand of interest from RD, OR, RR.

col_density

Color for the density plot (default is "blue").

fill_density

Fill color for the density plot (default is "lightblue").

main

Title of the plot (default is "Density of ATE Estimates").

xlab

X-axis label (default is "ATE").

ylab

Y-axis label (default is "Density").

xlim

Limits for the x-axis (default is NULL).

ylim

Limits for the y-axis (default is NULL).

...

Additional graphical parameters passed to the plot function.

Value

A ggplot object representing the density plot for the posterior predictive distribution of the Average Treatment Effect (ATE).

Examples

# 1) Specify simple treatment‐assignment models
amodel <- list(
  c("(Intercept)" =  0, "L1_1" =  0.5, "L2_1" = -0.5),
  c("(Intercept)" =  0, "L1_2" =  0.5, "L2_2" = -0.5, "A_prev" = 0.3)
)
# 2) Specify a continuous‐outcome model
ymodel <- c("(Intercept)" = 0,
            "A1"         = 0.2,
            "A2"         = 0.3,
            "L1_2"       = 0.1,
            "L2_2"       = -0.1)
# 3) Simulate without right‐censoring
testdata <- simData(
  n                = 200,
  n_visits         = 2,
  covariate_counts = c(2, 2),
  amodel           = amodel,
  ymodel           = ymodel,
  y_type           = "continuous",
  right_censor     = FALSE,
  seed             = 123)
model <- bayesmsm(ymodel = Y ~ A1 + A2,
                  nvisit = 2,
                  reference = c(rep(0,2)),
                  comparator = c(rep(1,2)),
                  treatment_effect_type = "sq",
                  family = "binomial",
                  data = testdata,
                  wmean = rep(1,200),
                  nboot = 10,
                  optim_method = "BFGS",
                  seed = 890123,
                  parallel = FALSE)
plot_ATE(model)

Error bar plots for causal treatment effects

Description

This function plots the point estimates and 95% credible intervals of ATE and APO from bayesmsm output.

Usage

plot_est_box(input, ...)

Arguments

input

A data frame or model object containing bootstrap results.

...

Additional arguments passed to the plotting function.

Value

A ggplot object presenting error bar plot of the mean effects and their 95% credible intervals for comparator level, reference level, and ATE.

Examples

# 1) Specify simple treatment‐assignment models
amodel <- list(
  c("(Intercept)" =  0, "L1_1" =  0.5, "L2_1" = -0.5),
  c("(Intercept)" =  0, "L1_2" =  0.5, "L2_2" = -0.5, "A_prev" = 0.3)
)
# 2) Specify a continuous‐outcome model
ymodel <- c("(Intercept)" = 0,
            "A1"         = 0.2,
            "A2"         = 0.3,
            "L1_2"       = 0.1,
            "L2_2"       = -0.1)
# 3) Simulate without right‐censoring
testdata <- simData(
  n                = 200,
  n_visits         = 2,
  covariate_counts = c(2, 2),
  amodel           = amodel,
  ymodel           = ymodel,
  y_type           = "continuous",
  right_censor     = FALSE,
  seed             = 123)
model <- bayesmsm(ymodel = Y ~ A1 + A2,
                  nvisit = 2,
                  reference = c(rep(0,2)),
                  comparator = c(rep(1,2)),
                  treatment_effect_type = "sq",
                  family = "binomial",
                  data = testdata,
                  wmean = rep(1,200),
                  nboot = 10,
                  optim_method = "BFGS",
                  seed = 890123,
                  parallel = FALSE)
plot_est_box(model)

Generate synthetic longitudinal data with optional right-censoring

Description

This function simulates repeated measurements of normally-distributed covariates, binary treatments, and an end-of-study outcome for longitudinal causal analyses. When 'right_censor = TRUE', a right-censoring indicator 'Cj' is generated at each visit: if 'Cj = 1', all subsequent 'L', 'A', and 'Y' values are set to 'NA'.

Usage

simData(
  n,
  n_visits,
  covariate_counts = rep(2, n_visits),
  amodel,
  ymodel,
  y_type = c("binary", "continuous"),
  right_censor = FALSE,
  cmodel = NULL,
  seed = NULL
)

Arguments

n

Integer. Sample size.

n_visits

Integer. Number of visits (including baseline as visit 1).

covariate_counts

Integer vector of length 'n_visits'. Number of covariates per visit (default: rep(2, n_visits)).

amodel

List of length 'n_visits'. Each element is a named numeric vector of coefficients for the logistic model of treatment 'Aj' on covariates (and 'A_prev' for j > 1).

ymodel

Named numeric vector. Coefficients for the end-of-study outcome model. If 'y_type = "binary"', a logistic model is used; if '"continuous"', a linear model with Gaussian noise.

y_type

Character. One of "binary" or "continuous".

right_censor

Logical. If TRUE, generates 'Cj' using 'cmodel' at each visit.

cmodel

List of length 'n_visits'. Named numeric vectors for logistic censoring models at each visit, regressing 'Cj' on covariates and current 'Aj'.

seed

Integer. Optional random seed.

Value

A 'data.frame' with columns 'Lk_j', 'Aj', optional 'Cj', and 'Y'.


Summary function to generate result table from bayesmsm

Description

This function generates a ready to use result table that contents the estimated APO and ATE and their 95% credible intervals

Usage

summary_bayesmsm(model)

Arguments

model

A model object from bayesmsm

Value

A summary table of the results from bayesmsm.

Examples

# 1) Specify simple treatment‐assignment models
amodel <- list(
  c("(Intercept)" =  0, "L1_1" =  0.5, "L2_1" = -0.5),
  c("(Intercept)" =  0, "L1_2" =  0.5, "L2_2" = -0.5, "A_prev" = 0.3)
)
# 2) Specify a continuous‐outcome model
ymodel <- c("(Intercept)" = 0,
            "A1"         = 0.2,
            "A2"         = 0.3,
            "L1_2"       = 0.1,
            "L2_2"       = -0.1)
# 3) Simulate without right‐censoring
testdata <- simData(
  n                = 200,
  n_visits         = 2,
  covariate_counts = c(2, 2),
  amodel           = amodel,
  ymodel           = ymodel,
  y_type           = "continuous",
  right_censor     = FALSE,
  seed             = 123)
model <- bayesmsm(ymodel = Y ~ A1 + A2,
                  nvisit = 2,
                  reference = c(rep(0,2)),
                  comparator = c(rep(1,2)),
                  treatment_effect_type = "sq",
                  family = "binomial",
                  data = testdata,
                  wmean = rep(1,200),
                  nboot = 10,
                  optim_method = "BFGS",
                  seed = 890123,
                  parallel = FALSE)
summary_bayesmsm(model)