library(lavaan)
source("satisfactionI.r")
head(df)
source("ANOVA_data.r")
head(df)
One-Way ANOVA
Thompson, M., Lie, Y. & Green, S. (2023). Flexible structural equation modeling approaches for analyzing means. In R. Hoyle (Ed.), Handbook of structural equation modeling (2nd ed., pp. 385-408). New York, NY: Guilford Press.
This example shows the SEM approach to a one-way ANOVA. Results are reported in Table 21.1 (p. 389).
The data are described at the top of page 388. Thompson, Lie & Green (TLG) claim that the data are available in supplementary materials but I’m unable to locate it. However, the data are available in Mplus data files in supplementary materials for the 1st edition. I’ve collected the data into a more convenient format (see the next section for getting the data), but it is in a “long” format. It needs to be rearranged from “long” to “wide”.
Load package and get the data
Load the lavaan package, and run satisfactionI.r
and ANOVA_data.r
to get and rearrange the data (satisfactionI.r
and ANOVA_data.r
are available at the end of this post). The data will be in the df
data frame.
The variables used in this example are:
- x - Coping Strategy (“a” - no strategy; “b” - discussion; “c” - exercise)
- y - dependent variable (“after” Life-Satisfaction scores)
The models
The SEM model for one-way ANOVA is shown in Fig 21.1 (p. 391), and is reproduced below. The diagram shows the “Less Constrained” model - the three means, represented by the labels on the arrows connecting the “1” to the dependent variable, differ. To be consistent with the ANOVA assumption of homogeneity of variances, the residual variances are constrained to be equal.
Two models are fitted. The model statements are shown below. The “More Constrained” model constrains the means (each with the same label “a”) to equality. The “Less Constrained” model allows the means (represented by the labels “a1”, “a2”, and “a3”) to differ across the groups. (Alternatively, this line could have been written as y ~ 1
; that is, with no label, the means are freely estimated in each group. I leave the labels in to be consistent with the diagram of the model.) In both models the residual variances (each with the same label “e”) are constrained to equality.
In what follows, I use lists. The model statements are placed into a list, then I use the lapply()
or sapply()
function to perform operations on each element in the list (such as sem()
to run the analyses, or summary()
to return summaries of the analyses, or [[
to extract elements); and I use the Reduce()
function when I need to perform operations across the two models (such as anova()
to constrast the fit of the two models).
<- list(
models "More Constrained" =
"y ~ c(a, a, a)*1 # Means
y ~~ c(e, e, e)*y # Variances",
"Less Constrained" =
"y ~ c(a1, a2, a3)*1
y ~~ c(e, e, e)*y"
)
Fit the models and get the results
The lapply()
function applies the sem()
function to the two elements of the models
list (with data
set to df
, and group
set to the "x"
variable).
<- lapply(models, sem, data = df, group = "x")
fit
lapply(fit, summary)
The “SEM” sections of Table 21.1 show the means, pooled error variances, and the \(\upchi\)2 test.
The summaries show “Intercepts” (that is, the estimated means) and “Variances” (that is, the error variances) for each “Coping Strategy” group for both models. Compare with means and pooled error variances in the SEM section in Table 21.1.
Rather than, or perhaps, as well as, searching through the model summaries for the means and variances, they can be extracted from a list of estimates of model parameters.
## Get list of estimates
<- lapply(fit, lavInspect, "est"); estimates
estimates
## Extract means - in element "nu"
<- list()
means for (i in names(models)) {
<- estimates[[i]] |>
means[[i]] sapply("[[", "nu")
}<- do.call(cbind, means); means
means
## Extract error variances - in element "theta"
<- list()
ErrorVar for (i in names(models)) {
<- estimates[[i]] |>
ErrorVar[[i]] sapply("[[", "theta")
}<- do.call(cbind, ErrorVar); ErrorVar ErrorVar
To perform the \(\upchi\)2 test (to compare the fit of the two models), apply the anova()
function to the two models.
Reduce(anova, fit)
Compare with the \(\upchi\)2 statistic and p value in Table 21.1.
On page 390, TLG give model fit statistics for both models. These are available in the anova output above, or they can be extracted separately from the list of fit measures. First, a function to extract the \(\upchi\)2 statistic, degrees of freedom, and the p value, then that function is applied to both models.
<- function(fit) {
GetFit <- fitMeasures(fit, c("chisq", "df", "pvalue"))
tab <- round(tab, 3)
tab return(tab)
}
sapply(fit, GetFit)
Note: Neither model fits well.
TLG mention the calculation for R2 (p. 390). The relevant SSEs can be obtained from the error variances (see ErrorVar
) by multiplying error variance by sample size. However, note that multiplication is not needed because sample size will cancel out; that is, substitute the error variances into Equation 21.4.
<- ErrorVar["a", ] |>
Rsquare Reduce(function(mc, lc) (mc - lc)/mc, x = _) # Substitute into Eq 21.4
Rsquare
Relaxing assumption of homogeneity of variances
TLG do not run these models though they make reference to them. The model statements when the homogeneity of variances assumption is relaxed are shown below.
<- list(
models "More Constrained" =
"y ~ c(a, a, a)*1 # Means
y ~~ c(e1, e2, e3)*y # Variances",
"Less Constrained" =
"y ~ c(a1, a2, a3)*1
y ~~ c(e1, e2, e3)*y"
)
Run the models and get the summaries. In this analysis I use the “mlm” estimator, a robust ML estimator; that is, the normality assumption is relaxed also.
<- lapply(models, sem, data = df, group = "x", estimator = "mlm")
fit lapply(fit, summary)
This time, the “Less Constrained” model is just identified - a perfect fit.
R code with minimal commenting
## One-way ANOVA
##
## Thompson, M., Lie, Y. & Green, S. (2023). Flexible structural equation modeling
## approaches for analyzing means. In R. Hoyle (Ed.), Handbook of structural
## equation modeling (2nd ed., pp. 385-408). New York, NY: Guilford Press.
## Load package
library(lavaan)
## Get the data
source("satisfactionI.r")
head(df)
## Rearrange the data file
source("ANOVA_data.r")
head(df)
## The models
<- list(
models "More Constrained" =
"y ~ c(a, a, a)*1 # Means
y ~~ c(e, e, e)*y # Variances",
"Less Constrained" =
"y ~ c(a1, a2, a3)*1
y ~~ c(e, e, e)*y"
)
## Fit the models
<- lapply(models, sem, data = df, group = "x")
fit
## Get model summaries
## Check results with "SEM" section of Table 21.1
lapply(fit, summary)
## Extract means and variances from list of estimates
## Get list of estimates
<- lapply(fit, lavInspect, "est"); estimates
estimates
## Extract means - in element "nu"
<- list()
means for (i in names(models)) {
<- estimates[[i]] |>
means[[i]] sapply("[[", "nu")
}<- do.call(cbind, means); means
means
## Extract error variances - in element "theta"
<- list()
ErrorVar for (i in names(models)) {
<- estimates[[i]] |>
ErrorVar[[i]] sapply("[[", "theta")
}<- do.call(cbind, ErrorVar); ErrorVar
ErrorVar
## Contrast model fits
## Check with chi sq statistic and p value in Table 21.1
Reduce(anova, fit)
## Fit for each model - Chi squares
## Check with values on page 390
## First, a function to extract chi squares
<- function(fit) {
GetFit <- fitMeasures(fit, c("chisq", "df", "pvalue"))
tab <- round(tab, 3)
tab return(tab)
}
sapply(fit, GetFit)
## R square
## Check with Equation 21.4
<- ErrorVar["a", ] |>
Rsquare Reduce(function(mc, lc) (mc - lc)/mc, x = _) # Substitute into Eq 21.4
Rsquare
## Relax homogeneity of variances assumption
<- list(
models "More Constrained" =
"y ~ c(a, a, a)*1 # Means
y ~~ c(e1, e2, e3)*y # Variances",
"Less Constrained" =
"y ~ c(a1, a2, a3)*1
y ~~ c(e1, e2, e3)*y"
)
## Run the model and get the summary
<- lapply(models, sem, data = df, group = "x", estimator = "mlm")
fit lapply(fit, summary)
R code to get data file - satisfactionI.r
### Data for Tables 21.1, 21.2, 21.3, 21.4 ###
<- structure(list(x = c("a", "a", "a", "a", "a", "a", "a", "a",
df "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "b", "b", "b",
"b", "b", "b", "b", "b", "b", "b", "b", "b", "c", "c", "c", "c",
"c", "c", "c", "c", "c", "c", "c", "c", "c", "c", "c", "c", "c",
"c"), g = c("m", "m", "m", "m", "m", "m", "f", "f", "f", "m",
"m", "m", "m", "m", "m", "f", "f", "f", "m", "m", "m", "f", "f",
"f", "m", "m", "m", "f", "f", "f", "m", "m", "m", "f", "f", "f",
"f", "f", "f", "m", "m", "m", "f", "f", "f", "f", "f", "f"),
c = c("before", "before", "before", "before", "before", "before",
"before", "before", "before", "after", "after", "after",
"after", "after", "after", "after", "after", "after", "before",
"before", "before", "before", "before", "before", "after",
"after", "after", "after", "after", "after", "before", "before",
"before", "before", "before", "before", "before", "before",
"before", "after", "after", "after", "after", "after", "after",
"after", "after", "after"), y = c(21, 19, 22, 21, 24, 23,
21, 24, 23, 22, 22, 24, 25, 27, 30, 22, 23, 24, 23, 23, 21,
19, 22, 21, 30, 26, 22, 25, 26, 27, 27, 25, 24, 25, 23, 22,
23, 28, 26, 34, 30, 26, 26, 27, 28, 29, 40, 42)), class = "data.frame", row.names = c(NA,
-48L))
head(df)
## x - Coping Strategy (a - No strategy; b - Discussion; c - Exercise)
## g - Gender
## c - before/after
## y - dependent variable (Life Satisfaction)
R code to rearrange data file - ANOVA_data.r
### Data for Tables 21.1, 21.2, 21.3, 21.4 ###
## Reshape data - long to wide
<- 0.5 * table(df$x) # in each condition
tab $id <- c(rep(1:tab[1], 2), rep(1:tab[2], 2), rep(1:tab[3], 2)) # id variable
df
<- reshape(df, timevar = "c", idvar = c("id", "x", "g"), varying = c("pre", "y"),
df direction = "wide")
<- within(df, {
df ## Grand mean centered "pre" - the before scores
<- scale(pre, scale = FALSE)
preC
## Drop the id variable
<- NULL
id
## Gender X Coping Strategy interaction
<- interaction(x, g, sep = "")
sg })