--- title: "How to evaluate theory-based hypotheses in a lavaan model using the GORICA" author: "Rebecca M. Kuiper and Yves Rosseel" date: "`r format(Sys.Date(), '%Y-%m-%d')`" output: rmarkdown::html_vignette: toc: true toc_depth: 2 number_sections: true vignette: > %\VignetteIndexEntry{How to evaluate theory-based hypotheses in a lavaan model using the GORICA} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( warning = FALSE, message = FALSE, tidy = FALSE, # figuren: leesbaar + responsive fig.width = 8, fig.height = 5, dpi = 150, fig.retina = 2, fig.align = "center", out.width = "100%", out.extra = 'style="height:auto;"' ) options(width = 200) ``` This is a tutorial for applying the AIC-type criterion GORICA, using the `goric` function, to a `lavaan` object. The GORICA is an information criterion that can be used to evaluate theory-driven, informative, order-restricted hypotheses. The GORICA selects the best hypothesis out of a given set. Below, you will find examples for the use of the `goric` function in the `restriktor` R package when you have a lavaan object. More information (tutorials and example R scripts) is available from [this github page of Rebecca](https://github.com/rebeccakuiper/Tutorials), including a tutorial regarding interpreting the GORIC(A) output (called 'Guidelines_output_GORIC'). # Packages First, install and call the `lavaan` library and the `restriktor` library (for the `goric` function). If needed, it is possible to view the description of the function with the `?` operator or the `help` command. ```{r, results='hide', message=FALSE, warning=FALSE} # To install restriktor in R: #if (!require("restriktor")) install.packages("restriktor") library(restriktor) # print docs in the help-tab to view arguments and explanations for the function #?goric # To install lavaan in R: # if (!require("lavaan")) install.packages("lavaan") library(lavaan) ``` # Example 1: Confirmatory Factor Analysis In this example, we will look at the [Confirmatory Factor Analysis (CFA) lavaan example](https://www.lavaan.ugent.be/tutorial/cfa.html). To specify your hypotheses in terms of model parameters, you should give your own labels to estimates by including them in the `lavaan` model syntax. ```{r} # specify the model, using own labeling HS.model <- ' visual =~ lambda_v1*x1 + lambda_v2*x2 + lambda_v3*x3 textual =~ lambda_t1*x4 + lambda_t2*x5 + lambda_t3*x6 speed =~ lambda_s1*x7 + lambda_s2*x8 + lambda_s3*x9 ' ``` Next, we need to formulate the hypothesis of interest. Let us say that the hypothesis is that all standardized factor loadings are at least (in absolute sense) .6. Note that we compare the factor loadings to a number, which is often the most meaningful when inspecting standardized factor loadings; then, use `standardized = TRUE` in the `goric` function. As another note, it is also possible to compare the size of (absolute values of) various standardized factor loadings. Since we are interested in one theory-based hypothesis, we compare it to its complement (reflecting all the other possibilities); which is done by default in the `goric` function (i.e., by default: `comparison = "complement"`). ```{r} # Hypotheses H1_cfa <- ' abs(lambda_v1) > .6; abs(lambda_v2) > .6; abs(lambda_v3) > .6; abs(lambda_t1) > .6; abs(lambda_t2) > .6; abs(lambda_t3) > .6; abs(lambda_s1) > .6; abs(lambda_s2) > .6; abs(lambda_s3) > .6; ' # vs its complement (default) ``` Then, we fit the confirmatory factor analysis (CFA) model using the `cfa` function from `lavaan`: ```{r} # fit the model # fixing the variances of all the latent variables in a CFA model to unity (std.lv = TRUE; https://www.lavaan.ugent.be/tutorial/syntax2.html) fit_cfa <- cfa(HS.model, data = HolzingerSwineford1939, std.lv = TRUE) ``` Now, we can call the `goric` function. Because of entering a lavaan object, we use GORICA (i.e.,`type = "gorica"`) by default. ```{r} # Calculate GORICA values and weights for H1_cfa and its complement. set.seed(100) # Needed for reproducibility & sensitivity check results_cfa <- goric(fit_cfa, hypotheses = list(H1_cfa = H1_cfa), standardized = TRUE) #summary(results_cfa) results_cfa ``` Conclusion: The order-restricted hypothesis ‘H1_cfa’ has $< 1$ times more, so, less, support than its complement. Thus, we did not find support for our hypothesis, that is, there is at least one standardized factor loading less than 0.6 (in absolute sense). # Example 2: Multiple Group CFA In this example, we will extend the [CFA with a multi-group structure](https://www.lavaan.ugent.be/tutorial/groups.html). There are two groups: 1. Pasteur and 2. Grant-White. Each will have there own parameter estimates and thus also need there own labeling. In this example, we will evaluate hypothesis regarding the latent means of the two schools. Before we do, we need to establish measurement invariance, such that we can fairly compare the means. ## Measurement invariance We could use the GORICA to check for measurement invariance. When you can specify ranges of values for which the parameters, say, loadings, are the same for each group, then there is an added value of using the GORICA. Here, we will inspect the case where we set the the parameters, say, loadings, to be equal across groups. Then, one can use the AIC as well, where we will also determine the AIC weights for a more in-depth comparison. Note that we compare parameters across groups, so they are already on a comparable scale, since they denote the same relationship in each group. Therefore, we do not need the standardized values. ```{r} # specify the model HS.model_mgcfa <- ' visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 ' # configural invariance fit_ci <- cfa(HS.model_mgcfa, data = HolzingerSwineford1939, group = "school") # weak invariance fit_wi <- cfa(HS.model_mgcfa, data = HolzingerSwineford1939, group = "school", group.equal = "loadings") # strong invariance fit_si <- cfa(HS.model_mgcfa, data = HolzingerSwineford1939, group = "school", group.equal = c("intercepts", "loadings")) # strict invariance fit_strict <- cfa(HS.model_mgcfa, data = HolzingerSwineford1939, group = "school", group.equal = c("intercepts", "loadings", "residuals")) # model comparison with AIC AIC_meas.invar <- lavTestLRT(fit_ci, fit_wi, fit_si, fit_strict)$AIC hypo.names <- c("configural", "weak", "strong", "strict") AIC_weights <- calc_ICweights(AIC_meas.invar, hypo.names) #print(AIC_weights, use_scientific = FALSE, digits = 3) AIC_weights$ratio_IC_weights AIC_weights$ratio_IC_weights["weak",] ``` The weak invariance (equal factor loadings) is the best of the set, since it has the lowest value. From the AIC weights, we can see that it is about 20 times more supported than configural invariance, and many times more supported than strong and strict invariance. Therefore, it is unwise to directly compare the values of the latent means across the two groups. To still show how one can do that using the GORICA, we do proceed here. ## Hypothesis evaluation using GORICA From here on, we assume that strong (or even strict) measurement invariance was established, such that we can fairly compare the latent means. Let us say that our expectation is the latent means of our three factors are in absolute sense higher in the Pasteur group than in the Grant-White group. First, we will create the model such that we obtain these latent means. Second, we specify our hypothesis. Third, we run the model. Fourth, we evaluate our hypothesis using the GORICA. ```{r} # specify the model, using own labeling (per group!) # Specify model, such that we obtain the latent means (and not differences w.r.t. a reference category). HS.model_mgcfa <- ' visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 # latent variable intercepts; here: # the latent means visual ~ c(mean_v_P, mean_v_GW) * 1 textual ~ c(mean_t_P, mean_t_GW) * 1 speed ~ c(mean_s_P, mean_s_GW) * 1 ' # where below we ensure: # intercepts equal across groups $ # loadings equal across groups: # group.equal = c("intercepts", "loadings") # and # first intercept equal to 0: # marker.int.zero = TRUE # Hypotheses H1_mgcfa <- " abs(mean_v_P) > abs(mean_v_GW); abs(mean_t_P) > abs(mean_t_GW); abs(mean_s_P) > abs(mean_s_GW) " # vs its complement (default) # fit the model fit_mgcfa <- cfa(HS.model_mgcfa, data = HolzingerSwineford1939, group = "school", group.equal = c("intercepts", "loadings"), marker.int.zero = TRUE) # Calculate GORICA values and weights set.seed(100) # Needed for reproducibility & sensitivity check results_mgcfa <- goric(fit_mgcfa, hypotheses = list(H1_mgcfa = H1_mgcfa)) results_mgcfa # # # Alternative: # # In the case you want to extract the estimates and their covariance matrix yourself and use that as input for the goric function: # # # # Extract estimates and their covariance matrix # names_param <- c("mean_v_P", "mean_v_GW", "mean_t_P", "mean_t_GW", "mean_s_P", "mean_s_GW") # est <- coef(fit_mgcfa)[names_param] # VCOV <- vcov(fit_mgcfa)[names_param, names_param] # # # # Calculate GORICA values and weights # set.seed(100) # Needed for reproducibility & sensitivity check # results_mgcfa <- goric(est, VCOV = VCOV, # hypotheses = list(H1_mgcfa = H1_mgcfa)) # # # GORICA values and weights # #summary(results_mgcfa) # results_mgcfa ``` Conclusion: The order-restricted hypothesis ‘H1_mgcfa’ has less support (namely, $< 1$ times more support) than its complement. That is, there is support for the latent means being in absolute sense higher in the Pasteur group than in the Grant-White group. # Example 3: Structural equation modeling (SEM) In this example, we will look at the [structural equation modeling (SEM) lavaan example](https://www.lavaan.ugent.be/tutorial/sem.html). Here, we assume that we are interested in (some of) the regression parameters; so, we only need to label these. We want to compare the strengths of the predictive relationships; hence, we need standardized estimates for a meaningful comparison; that is, add `standardized = TRUE` in the `goric` function. One can compare strengths of relationships of various independent variables for a specific dependent variable, but one over different dependent variables. Here, we compare the strengths of relationships for one dependent variable, namely dem65. More specially, we expect that the predictive relationship of dem60 for dem65 (beta_dem65_dem60) is (in absolute sense) stronger than that of ind60 for dem65 (beta_dem65_ind60). ```{r} # specify the model, using own labeling model_sem <- ' # measurement model ind60 =~ x1 + x2 + x3 dem60 =~ y1 + y2 + y3 + y4 dem65 =~ y5 + y6 + y7 + y8 # regressions dem60 ~ ind60 dem65 ~ beta_dem65_ind60*ind60 + beta_dem65_dem60*dem60 # residual correlations y1 ~~ y5 y2 ~~ y4 + y6 y3 ~~ y7 y4 ~~ y8 y6 ~~ y8 ' # Hypotheses H1_sem <- " abs(beta_dem65_dem60) > abs(beta_dem65_ind60) " # vs its complement (default) # fit the model fit_sem <- sem(model_sem, data = PoliticalDemocracy) # Calculate GORICA values and weights set.seed(100) # Needed for reproducibility & sensitivity check results_sem <- goric(fit_sem, hypotheses = list(H1_sem = H1_sem), standardized = TRUE) #summary(results_sem) results_sem # Unrestricted & Order-restricted parameter estimates of interest # Note: Hypotheses should be specified before seeing the data. # Looking afterwards at the standardized estimates can give you more insight. # # Unrestricted parameter estimates of interest subset(standardizedSolution(fit_sem), op == "~" & lhs == "dem65") # # Order-restricted parameter estimates of interest; # that is, the estimates of interest under the restrictions in the hypotheses: results_sem$ormle ``` Conclusion: The order-restricted hypothesis ‘H1_sem’ has ($> 1$ times) more support than its complement. Thus, we found support that the predictive relationship of ind60 for dem60 is (in absolute sense) higher that that for dem65. # Example 4: Multilevel SEM In this example, we will look at the [multilevel/two-level SEM lavaan example](https://www.lavaan.ugent.be/tutorial/multilevel.html). Here, we assume that we are interested in (some of) the regression parameters; so, we only need to label these. We want to compare the strengths of the predictive relationships. Hence, we need standardized estimates for a meaningful comparison; that is, add `standardized = TRUE` in the `goric` function. We expect that the order of the (absolute) strength/size in predictive relationships for fw (beta) is highest for x1, then x2, and then x3, that is, abs(beta_x1) > abs(beta_x2) > abs(beta_x3). ```{r} # specify the model, using own labeling model_msem <- ' level: 1 fw =~ y1 + y2 + y3 fw ~ beta_x1*x1 + beta_x2*x2 + beta_x3*x3 level: 2 fb =~ y1 + y2 + y3 fb ~ w1 + w2 ' # Hypotheses H1_msem <- " abs(beta_x1) > abs(beta_x2) > abs(beta_x3) " # vs its complement (default) # fit the model fit_msem <- sem(model = model_msem, data = Demo.twolevel, cluster = "cluster") # Calculate GORICA values and weights set.seed(100) # Needed for reproducibility & sensitivity check results_msem <- goric(fit_msem, hypotheses = list(H1_msem = H1_msem), standardized = TRUE) #summary(results_msem) results_msem ``` Conclusion: The order-restricted hypothesis ‘H1_msem’ has ($> 1$ times) more support than its complement. Thus, we found support that the order of the (absolute) strength in predictive relationships for fw (beta) is highest for x1, then x2, and then x3, that is, abs(beta_x1) > abs(beta_x2) > abs(beta_x3). # Example 5: linear growth model with a time-varying covariate In this example, we will look at the [linear growth model with a time-varying covariate lavaan example](https://www.lavaan.ugent.be/tutorial/growth.html). Here, we assume that we are interested in the predictive strength of the two predictors (x1 and x2) on the slope (s); so, we only need to label these. We expect that the strength of x2 on the slope (s_2) is (in absolute sense) higher than that of x1 (s_1). ```{r} # specify the model, using own labeling # a linear growth model with a time-varying covariate model_growth <- ' # intercept and slope with fixed coefficients i =~ 1*t1 + 1*t2 + 1*t3 + 1*t4 s =~ 0*t1 + 1*t2 + 2*t3 + 3*t4 # regressions i ~ x1 + x2 s ~ s_1*x1 + s_2*x2 # time-varying covariates t1 ~ c1 t2 ~ c2 t3 ~ c3 t4 ~ c4 ## Extra: Label intercepts #i ~ intercept_i * 1 #s ~ intercept_s * 1 ' # Hypotheses H1_growth <- " abs(s_2) > abs(s_1) " # vs its complement (default) # fit the model fit_growth <- growth(model_growth, data = Demo.growth) # Calculate GORICA values and weights set.seed(100) # Needed for reproducibility & sensitivity check results_growth <- goric(fit_growth, hypotheses = list(H1_growth = H1_growth), standardized = TRUE) #summary(results_growth) results_growth ``` Conclusion: The order-restricted hypothesis ‘H1_growth’ has many more ($>> 1$ times more) support than its complement. Thus, we found support that the intercept of the slope s is (in absolute sense) higher than the intercept of the intercept i. # Example 6: Mediation In this example, we will look at the [mediation lavaan example](https://www.lavaan.ugent.be/tutorial/mediation.html). Let us say we want to evaluate whether there is partial mediation or full mediation. It can be helpful to specify what you believe is the minimum effect. Here, we assume that an (in)direct effect between -0.1 and 0.1 can be seen as no effect. More examples can found on [this github page of Rebecca](https://github.com/rebeccakuiper/Tutorials/tree/main/GORICA%20for%20mediation). ```{r} # Create data set.seed(1234) X <- rnorm(100) M <- 0.5*X + rnorm(100) Y <- 0.7*M + rnorm(100) Data <- data.frame(X = X, Y = Y, M = M) # specify the model, using own labeling model_med <- ' # direct effect Y ~ c*X # mediator M ~ a*X Y ~ b*M # indirect effect (a*b) indirect := a*b # direct effect (c) # Optional direct := c # Could leave this out and use c ' # Hypotheses H_part <- "abs(indirect) > 0.1; abs(direct) > 0.1" H_full <- "abs(indirect) > 0.1; -0.1 < direct < 0.1" # and unconstrained as failsafe (default) # fit the model fit_med <- sem(model_med, data = Data) #summary(fit_med, standardized = TRUE) # Calculate GORICA values and weights: # # 1. based on the object set.seed(100) # Needed for reproducibility & sensitivity check results_med <- goric(fit_med, hypotheses = list(H_part = H_part, H_full = H_full), standardized = TRUE) ```