New paper: Adapting Structural Equation Modeling Fit indices into Bayesian SEM, Psychological Methods

BibTex (if applicable)
title = {Adapting {Fit} {Indices} for {Bayesian} {Structural} {Equation} {Modeling}: {Comparison} to {Maximum} {Likelihood}},
doi = {dx.doi.org/10.1037/met0000224},
language = {en},
journal = {Psychological Methods},
author = {Garnier-Villarreal, Mauricio and Jorgensen, Terrence D},
year = {2019}
}

Hi

Wanted to share this new paper that uses Stan through the blavaan package, also including the OSF

Garnier-Villarreal, M., & Jorgensen, T. D. (2019). Adapting Fit Indices for Bayesian Structural Equation Modeling: Comparison to Maximum Likelihood. Psychological Methods . https://doi.org/dx.doi.org/10.1037/met0000224

7 Likes

Thanks much for sharing. We’re working with Sophia Rabe-Hesketh, who developed the SEM package for Stata (as well as co-authoring the hierarchical modeling book). I don’t think she’s on our Discourse feed, though.

P.S. The use of braces in the title will be problematic for general BibTeX use as it’ll override user-specified bibliography styles for capitalization.

In your paper, the posterior predictive p value (PPP) of the chi square goodness of fit are used.
And I just now try to implement it for my models.

In frequentist chi square is the following:

T(y,\theta) := \sum _{c=1,\cdots,C} \frac{\bigl\{y_c- \mathbb{E}[Y_c|\theta ] \}^2}{\mathbb{E}[Y_c|\theta ]}\\

where, y_c is a observed positive integer of the random variavle Y_c.

In frequentist method, to calculate the p value of chi square statistics, it requires the notion of degree of freedom.
On the other hand, In Bayesian PPP, the degree of freedom is not required, since its definition is the following:

\iint \mathbb{I}[T(y,\theta) >T(y_{obs},\theta) ] p(y|\theta)p(\theta|y_{obs}) \, dy \, d\theta\\

in which there is no terms for the degree of freedom.

Is my interpretation correct? That is, in the Bayesian PPP, the notion of degree of freedom is no longer required.

Note that the above PPP is calculated by

\iint \mathbb{I}[T(y,\theta) >T(y_{obs},\theta) ] p(y|\theta)p(\theta|y_{obs}) \, dy \, d\theta\\ \approx \sum_i \int \mathbb{I}[T(y,\theta_i) >T(y_{obs},\theta_i) ] p(y|\theta_i)\\ \approx \sum_i \sum_j \mathbb{I}[T(y_i^j,\theta_i) >T(y_{obs},\theta_i) ]\\

where \theta_i is MCMC samples from posterior and y_i^j, j=1,2,3,... is a samples drawn from likelihood at parameter \theta_i.

Moreover I think if C is larger, then the above T(y|\theta) is also larger,
and to avoid this, I think I should use the normalizing multiple \frac{1}{C} as follows:

T(y,\theta) := \frac{1}{C} \sum _{c=1,\cdots,C} \frac{\bigl\{y_c- \mathbb{E}[Y_c|\theta ] \}^2}{\mathbb{E}[Y_c|\theta ]}.

I am glad that these type of methodas are being extended to other platforms. I work with Ed Merkle to add features to blavaan. Have several on going projects, and will share here the papers and methods that we test

Jean

The ppp based on the chi-square is the commonly use metric of overall model fit that is presented in Bayesian SEM software. We compared our results to it since it is the standard, but is not what we propose.

The chi-saure ppp compares the observed likelihood and compares it to the model implied likelihood at each iteration of the posterior distribution. This is where this ppp comes from, which ends up behaving like the p-value from the frequentist chi-square.

What we propose is to build posterior distributions for chi-square approximate fit indices. The use of these fit indices is common practice in SEM. Our method shows that under diffuse priors the mean of the fit indices distributions is equivalent to the frequentist one. Adding the benefit of having a full distribution to make inferences from, instead of just the point estimate. While in frequentist SEM only RMSEA presents a confidence interval, ours present credible intervals for every approximate fit index

No one index is perfect, so we present how much each simulation condition affected each to recommend the use of the indices that are less sensitive to model or data characteristic. And use the ones that are mostly sensitive to misfit. Based on our simulation would be Gamma-hat and CFI. The wide of the Credible Interval is still sensitive to sample size.

More research is needed on the use of these indices in other areas, and methods of local misfit evaluation. Which we are looking at for our next projects.

Hope this helps clarify

1 Like