In this vignette, we explain how one can compute marginal
likelihoods, Bayes factors, and posterior model probabilities using a
simple hierarchical normal model implemented in JAGS. This
vignette uses the same models and data as the Stan
vignette.
The model that we will use assumes that each of the \(n\) observations \(y_i\) (where \(i\) indexes the observation, \(i = 1,2,...,n\)) is normally distributed
with corresponding mean \(\theta_i\)
and a common known variance \(\sigma^2\): \(y_i
\sim \mathcal{N}(\theta_i, \sigma^2)\). Each \(\theta_i\) is drawn from a normal
group-level distribution with mean \(\mu\) and variance \(\tau^2\): \(\theta_i \sim \mathcal{N}(\mu, \tau^2)\).
For the group-level mean \(\mu\), we
use a normal prior distribution of the form \(\mathcal{N}(\mu_0, \tau^2_0)\). For the
group-level variance \(\tau^2\), we use
an inverse-gamma prior of the form \(\text{Inv-Gamma}(\alpha, \beta)\). We will
use JAGS to fit the model which parametrizes the normal
distribution in terms of the precision (i.e., one over the variance).
Consequently, we implement this inverse-gamma prior on \(\tau^2\) by placing a gamma prior of the
form \(\text{Gamma}(\alpha, \beta)\) on
the precision; we call this precision parameter invTau2 in
the code.
In this example, we are interested in comparing the null model \(\mathcal{H}_0\), which posits that the group-level mean \(\mu = 0\), to the alternative model \(\mathcal{H}_1\), which allows \(\mu\) to be different from zero. First, we generate some data from the null model:
library(bridgesampling)
### generate data ###
set.seed(12345)
mu <- 0
tau2 <- 0.5
sigma2 <- 1
n <- 20
theta <- rnorm(n, mu, sqrt(tau2))
y <- rnorm(n, theta, sqrt(sigma2))
Next, we specify the prior parameters \(\mu_0\), \(\tau^2_0\), \(\alpha\), and \(\beta\):
### set prior parameters ###
mu0 <- 0
tau20 <- 1
alpha <- 1
beta <- 1
Now we can fit the null and the alternative model in
JAGS (note that it is necessary to install
JAGS for this). One usually requires a larger number of
posterior sample for estimating the marginal likelihood than for simply
estimating the model parameters. This is the reason for using a
comparatively large number of samples (i.e., 50,000 post burn-in samples
per chain) for this comparatively simple model.
library(R2jags)
### functions to get posterior samples ###
# H0: mu = 0
getSamplesModelH0 <- function(data, niter = 52000, nburnin = 2000, nchains = 3) {
model <- "
model {
for (i in 1:n) {
theta[i] ~ dnorm(0, invTau2)
y[i] ~ dnorm(theta[i], 1/sigma2)
}
invTau2 ~ dgamma(alpha, beta)
tau2 <- 1/invTau2
}"
s <- jags(data, parameters.to.save = c("theta", "invTau2"),
model.file = textConnection(model),
n.chains = nchains, n.iter = niter,
n.burnin = nburnin, n.thin = 1)
return(s)
}
# H1: mu != 0
getSamplesModelH1 <- function(data, niter = 52000, nburnin = 2000,
nchains = 3) {
model <- "
model {
for (i in 1:n) {
theta[i] ~ dnorm(mu, invTau2)
y[i] ~ dnorm(theta[i], 1/sigma2)
}
mu ~ dnorm(mu0, 1/tau20)
invTau2 ~ dgamma(alpha, beta)
tau2 <- 1/invTau2
}"
s <- jags(data, parameters.to.save = c("theta", "mu", "invTau2"),
model.file = textConnection(model),
n.chains = nchains, n.iter = niter,
n.burnin = nburnin, n.thin = 1)
return(s)
}
### get posterior samples ###
# create data lists for JAGS
data_H0 <- list(y = y, n = length(y), alpha = alpha, beta = beta, sigma2 = sigma2)
data_H1 <- list(y = y, n = length(y), mu0 = mu0, tau20 = tau20, alpha = alpha,
beta = beta, sigma2 = sigma2)
# fit models
samples_H0 <- getSamplesModelH0(data_H0)
samples_H1 <- getSamplesModelH1(data_H1)
The next step is to write the corresponding
log_posterior (i.e., unnormalized posterior) function for
both models. This function takes one draw from the joint posterior and
the data object as input and returns the log of the unnormalized joint
posterior density. When using MCMC software such as JAGS or
Stan, specifying this function is relatively simple. As a
rule of thumb, one only needs to look for all places where a
“~” sign appears in the model code. The log of the
densities on the right-hand side of these “~” symbols needs
to be evaluated for the relevant quantities and then these log densities
values are summed.
For example, in the null model, there are three “~”
signs. Starting at the data-level, we need to evaluate the log of the
normal density with mean \(\theta_i\)
and variance \(\sigma^2\) for all \(y_i\) and then sum the resulting log
density values. Next, we move one step up in the model and evaluate the
log of the group-level density for all \(\theta_i\). Hence, we evaluate the log of
the normal density for \(\theta_i\)
with mean \(\mu = 0\) and variance
\(\tau^2\) (remember that
JAGS parametrizes the normal distribution in terms of the
precision invTau2 = \(1/\tau^2\); in contrast, R
parametrizes it in terms of the standard deviation) and sum the
resulting log density values. The result of this summation is added to
the result of the previous summation for the data-level normal
distribution. Finally, we need to evaluate the log of the prior density
for invTau2. This means that we compute the log density of
the gamma distribution with parameters \(\alpha\) and \(\beta\) for the sampled
invTau2 value and add the resulting log density value to
the result of summing the data-level and group-level log densities. The
unnormalized log posterior for the alternative model can be obtained in
a similar fashion. The resulting functions look as follows:
### functions for evaluating the unnormalized posteriors on log scale ###
log_posterior_H0 <- function(samples.row, data) {
mu <- 0
invTau2 <- samples.row[[ "invTau2" ]]
theta <- samples.row[ paste0("theta[", seq_along(data$y), "]") ]
sum(dnorm(data$y, theta, data$sigma2, log = TRUE)) +
sum(dnorm(theta, mu, 1/sqrt(invTau2), log = TRUE)) +
dgamma(invTau2, data$alpha, data$beta, log = TRUE)
}
log_posterior_H1 <- function(samples.row, data) {
mu <- samples.row[[ "mu" ]]
invTau2 <- samples.row[[ "invTau2" ]]
theta <- samples.row[ paste0("theta[", seq_along(data$y), "]") ]
sum(dnorm(data$y, theta, data$sigma2, log = TRUE)) +
sum(dnorm(theta, mu, 1/sqrt(invTau2), log = TRUE)) +
dnorm(mu, data$mu0, sqrt(data$tau20), log = TRUE) +
dgamma(invTau2, data$alpha, data$beta, log = TRUE)
}
The final step before computing the log marginal likelihoods is to
specify the parameter bounds. In this example, for both models, all
parameters can range from \(-\infty\)
to \(\infty\) except the precision
invTau2 which has a lower bound of zero. These boundary
vectors need to be named and the names need to match the order of the
parameters.
# specify parameter bounds H0
cn <- colnames(samples_H0$BUGSoutput$sims.matrix)
cn <- cn[cn != "deviance"]
lb_H0 <- rep(-Inf, length(cn))
ub_H0 <- rep(Inf, length(cn))
names(lb_H0) <- names(ub_H0) <- cn
lb_H0[[ "invTau2" ]] <- 0
# specify parameter bounds H1
cn <- colnames(samples_H1$BUGSoutput$sims.matrix)
cn <- cn[cn != "deviance"]
lb_H1 <- rep(-Inf, length(cn))
ub_H1 <- rep(Inf, length(cn))
names(lb_H1) <- names(ub_H1) <- cn
lb_H1[[ "invTau2" ]] <- 0
Note that currently, the lower and upper bound of a parameter cannot be a function of the bounds of another parameter. Furthermore, constraints that depend on multiple parameters of the model are not supported. This excludes, for example, parameters that constitute a covariance matrix or sets of parameters that need to sum to one.
Now we are ready to compute the log marginal likelihoods using the
bridge_sampler function. We use silent = TRUE
to suppress printing the number of iterations to the console:
# compute log marginal likelihood via bridge sampling for H0
H0.bridge <- bridge_sampler(samples = samples_H0, data = data_H0,
log_posterior = log_posterior_H0, lb = lb_H0,
ub = ub_H0, silent = TRUE)
# compute log marginal likelihood via bridge sampling for H1
H1.bridge <- bridge_sampler(samples = samples_H1, data = data_H1,
log_posterior = log_posterior_H1, lb = lb_H1,
ub = ub_H1, silent = TRUE)
We obtain:
print(H0.bridge)
## Bridge sampling estimate of the log marginal likelihood: -37.53235
## Estimate obtained in 4 iteration(s) via method "normal".
print(H1.bridge)
## Bridge sampling estimate of the log marginal likelihood: -37.79776
## Estimate obtained in 5 iteration(s) via method "normal".
We can use the error_measures function to compute an
approximate percentage error of the estimates:
# compute percentage errors
H0.error <- error_measures(H0.bridge)$percentage
H1.error <- error_measures(H1.bridge)$percentage
We obtain:
print(H0.error)
## [1] "0.144%"
print(H1.error)
## [1] "0.162%"
To compare the null model and the alternative model, we can compute
the Bayes factor by using the bf function. In our case, we
compute \(\text{BF}_{01}\), that is,
the Bayes factor which quantifies how much more likely the data are
under the null versus the alternative model:
# compute Bayes factor
BF01 <- bf(H0.bridge, H1.bridge)
print(BF01)
## Estimated Bayes factor in favor of H0.bridge over H1.bridge: 1.30396
In this case, the Bayes factor is close to one, indicating that there
is not much evidence for either model. We can also compute posterior
model probabilities by using the post_prob function:
# compute posterior model probabilities (assuming equal prior model probabilities)
post1 <- post_prob(H0.bridge, H1.bridge)
print(post1)
## H0.bridge H1.bridge
## 0.5659652 0.4340348
When the argument prior_prob is not specified, as is the
case here, the prior model probabilities of all models under
consideration are set equal (i.e., in this case with two models to 0.5).
However, if we had prior knowledge about how likely both models are, we
could use the prior_prob argument to specify different
prior model probabilities:
# compute posterior model probabilities (using user-specified prior model probabilities)
post2 <- post_prob(H0.bridge, H1.bridge, prior_prob = c(.6, .4))
print(post2)
## H0.bridge H1.bridge
## 0.6616986 0.3383014