--- title: "Frequently Asked Questions" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Frequently Asked Questions} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## 1. What's the difference between a multiple-membership model and a conventional multilevel model? In a conventional hierarchical multilevel model, each lower-level unit belongs to exactly one higher-level unit (e.g., students nested in schools). In a *multiple-membership model (MMMM)*, lower-level units can belong to multiple higher-level units simultaneously, or alternatively, higher-level outcomes can be influenced by multiple lower-level units. For example: - **Conventional MLM:** Students (level 1) nested in schools (level 2) — each student attends exactly one school. - **MMMM:** Coalition governments (level 2) composed of multiple political parties (level 1) — each party can participate in multiple governments over time, and each government comprises multiple parties. The MMMM accounts for this complex membership structure by weighting the contributions of each lower-level unit, allowing researchers to model how multiple units jointly shape higher-level outcomes. ## 2. What's the difference between a conventional MMMM and the extended MMMM implemented in `bml`? The *conventional MMMM* (as implemented in MLwiN, brms, or other software) uses *fixed, pre-specified weights* to aggregate lower-level effects. For instance, you might use equal weights (1/n) or weights based on time spent in each context. The *extended MMMM* in `bml` allows you to: 1. **Parameterize the weight function:** Instead of fixing weights, you can specify a functional form for weights (e.g., `w ~ 1/n^exp(b*X)`) and estimate the parameters that determine how lower-level units are aggregated. 2. **Test alternative aggregation mechanisms:** Compare different weighting schemes (equal weights, proportional weights, functions of covariates) to determine which best fits the data. 3. **Endogenize weight matrices:** Rather than imposing spatial or network weights externally, let the data determine connection strengths as functions of covariates. This flexibility enables researchers to explicitly model the *micro-to-macro link* --- how lower-level characteristics aggregate to produce higher-level outcomes. ## 3. When should I use `bml` instead of other multilevel modeling packages? Use `bml` when: - **You have multiple-membership structures:** Higher-level outcomes depend on multiple lower-level units (e.g., coalitions composed of parties, teams composed of individuals, neighborhoods influenced by surrounding areas). - **You need flexible weight functions:** Your theory suggests weights should depend on covariates, group size, or other features, not be fixed in advance. - **You're studying aggregation processes / micro-to-macro relationships:** Your research question focuses on how lower-level units jointly shape higher-level outcomes, rather than how higher-level contexts shape lower-level outcomes. Rather than assuming a fixed aggregation (e.g., simple average), you want to test and estimate how lower-level effects combine. For standard hierarchical models without multiple membership, packages like `lme4`, `brms`, or `MCMCglmm` will be more efficient. For conventional MMMMs with fixed weights, `brms` or MLwiN are excellent alternatives. ## 4. What outcome types and distributions does `bml` support? `bml` supports a variety of outcome distributions commonly used in social science research: - **Continuous outcomes:** Gaussian (normal) regression - **Binary outcomes:** Logit (logistic) regression - **Survival/duration outcomes:** - Cox proportional hazards model - Weibull accelerated failure time (AFT) model You can also specify hierarchical random effects (`hm()` blocks) in addition to multiple-membership structures, allowing for hierarchically nested and cross-classified designs. ## 5. How do I specify the weight function, and what are the `c` and `ar` parameters? The weight function is specified in the `fn()` container within an `mm()` block: ```r mm( id = id(member_id, group_id), vars = vars(X), fn = fn(w ~ 1/n, c = TRUE), RE = TRUE, ar = FALSE ) ``` **Weight function components:** - **`w ~ ...`**: Specifies the functional form for weights. You can use: - Group size: `n` (number of members in each group) - Covariates: Any variable in your data - Mathematical functions: `exp()`, `log()`, `sqrt()`, etc. - Examples: `w ~ 1/n` (equal weights), `w ~ tenure` (proportional to tenure), `w ~ 1/n^exp(b*similarity)` (similarity-weighted) - **`c` parameter:** Controls weight normalization - `c = TRUE` (default): Weights are normalized to sum to 1 within each group (row-standardization) - `c = FALSE`: Weights are not normalized (useful when aggregating sums rather than averages) - **`ar` parameter:** Autoregressive random effects - `ar = FALSE` (default): Standard independent random effects for each member - `ar = TRUE`: Member-level random effects evolve as a random walk across repeated group participations, capturing dynamics where a member's unobserved heterogeneity changes over time **Important:** Only one `mm()` block can have `RE = TRUE` in a given model. ## 6. How do I fix parameters to known values? There are two ways to fix parameters in `bml`, depending on where in the model they appear. ### Main equation and `hm()` blocks: Using `fix()` Use the `fix()` helper inside `vars()` to hold a coefficient at a specified value rather than estimating it. This works both in the main equation and in `hm()` blocks. **Main equation:** ```r # Fix the coefficient of 'exposure' to 1 (i.e., use as an offset) m <- bml( y ~ 1 + fix(exposure, 1) + x1 + x2, family = "Gaussian", data = dat ) ``` This is equivalent to adding `exposure` as a predictor but constraining its coefficient to 1 instead of estimating it. This is useful for offsets or when prior theory dictates a specific coefficient value. **In `hm()` blocks:** ```r m <- bml( y ~ 1 + x1 + mm(id = id(pid, gid), fn = fn(w ~ 1/n), RE = TRUE) + hm(id = id(cid), vars = vars(fix(investiture, 0.5) + gdp), type = "FE"), family = "Gaussian", data = coalgov ) ``` **In `mm()` blocks:** ```r m <- bml( Surv(dur_wkb, event_wkb) ~ 1 + majority + mm( id = id(pid, gid), vars = vars(fix(cohesion, 1) + finance), fn = fn(w ~ 1/n, c = TRUE), RE = TRUE ), family = "Weibull", data = coalgov ) ``` Here, the coefficient of `cohesion` is fixed at 1 while `finance` is freely estimated. ### Weight function `fn()`: Omitting parameters In the weight function specified via `fn()`, you fix weights by simply not including any free parameters (`b0`, `b1`, ...) in the formula. When no parameters appear, the weight function is treated as a known, fixed transformation. ```r # Fixed equal weights (no parameters to estimate) fn(w ~ 1/n, c = TRUE) # Fixed weights proportional to seat share (no parameters to estimate) fn(w ~ pseat, c = TRUE) ``` Compare this with weight functions that include free parameters: ```r # Estimable: b is estimated from data fn(w ~ 1/n^exp(b * x), c = TRUE) # Estimable: b0 is estimated from data fn(w ~ 1 / (1 + (n - 1) * exp(-b0)), c = FALSE) ``` The key distinction: if the `fn()` formula contains `b`, `b0`, `b1`, etc., these are estimated. If the formula contains only data variables and constants (like `n`, `pseat`, `1`), the weights are fixed. ## 7. I get "Error in node w.1[...]: Invalid parent values" — what does this mean? This error occurs when JAGS cannot evaluate the weight function for a particular observation because the computed weight is numerically invalid (e.g., `NaN`, `Inf`, or a value outside the domain of a downstream function). It most commonly arises with **parameterized weight functions** during the initialization phase. **Why it happens:** Weight function parameters (`b0`, `b1`, ...) are given vague priors by default (`dnorm(0, 0.0001)` in JAGS precision, which corresponds to SD = 100). When JAGS initializes the MCMC chains, it may draw extreme starting values (e.g., `b0 = 80`). For weight functions involving nonlinear transformations like `ilogit()` or `exp()`, extreme parameter values can cause numerical issues downstream — even if the weight function itself is mathematically well-defined, the resulting weights may produce overflow in the likelihood (e.g., `exp(-mu * shape)` in the Weibull model). **Built-in safeguard:** `bml` initializes all weight parameters at 0 by default. This ensures numerically stable starting values (e.g., `ilogit(0) = 0.5`). However, if the error persists or occurs during sampling (not just initialization), consider the following steps. **How to fix it:** 1. **Use more informative priors.** Narrow the prior spread for weight parameters so that the sampler stays in a numerically stable region: ```r m <- bml( ..., priors = list( "b.w.1[1] ~ dnorm(0, 1)", # SD = 1 instead of 100 "b.w.1[2] ~ dnorm(0, 1)" ) ) ``` This is especially important for parameters inside `ilogit()`, `exp()`, or other functions that saturate or explode at extreme inputs. 2. **Ensure the weight function is bounded.** Unbounded weight functions can produce extreme values that destabilize the likelihood. Common strategies: - Use `ilogit()` to bound weights between 0 and 1: `fn(w ~ ilogit(b0 + b1 * x), c = TRUE)` - Use `exp()` carefully — it grows rapidly, so pair it with constraints like `c = TRUE` (normalization) or wrap the argument: `fn(w ~ exp(b1 * x), c = TRUE)` where `x` is standardized 3. **Standardize covariates in the weight function.** If a weight variable has a large range (e.g., income in thousands), the product `b1 * x` can easily overflow. Standardize such variables before including them in `fn()`. 4. **Supply explicit initial values.** If the default initialization at 0 doesn't work for your model, provide custom starting values: ```r m <- bml( ..., inits = list("b.w.1" = c(0.5, -0.1)) ) ``` 5. **Re-run the model.** Since the error can be seed-dependent (different chains draw different initial values), simply re-running may resolve it. However, this indicates a fragile parameterization — consider steps 1--3 for a robust solution.