New paper: Adapting Structural Equation Modeling Fit indices into Bayesian SEM, Psychological Methods

BibTex (if applicable)
title = {Adapting {Fit} {Indices} for {Bayesian} {Structural} {Equation} {Modeling}: {Comparison} to {Maximum} {Likelihood}},
doi = {},
language = {en},
journal = {Psychological Methods},
author = {Garnier-Villarreal, Mauricio and Jorgensen, Terrence D},
year = {2019}


Wanted to share this new paper that uses Stan through the blavaan package, also including the OSF

Garnier-Villarreal, M., & Jorgensen, T. D. (2019). Adapting Fit Indices for Bayesian Structural Equation Modeling: Comparison to Maximum Likelihood. Psychological Methods .


Thanks much for sharing. We’re working with Sophia Rabe-Hesketh, who developed the SEM package for Stata (as well as co-authoring the hierarchical modeling book). I don’t think she’s on our Discourse feed, though.

P.S. The use of braces in the title will be problematic for general BibTeX use as it’ll override user-specified bibliography styles for capitalization.

I read abstract only.

In your paper, the posterior predictive p value (PPP) of the chi square goodness of fit are used.
And I just now try to implement it for my models.

In frequentist chi square is the following:

T(y,\theta) := \sum _{c=1,\cdots,C} \frac{\bigl\{y_c- \mathbb{E}[Y_c|\theta ] \}^2}{\mathbb{E}[Y_c|\theta ]}\\

where, y_c is a observed positive integer of the random variavle Y_c.

In frequentist method, to calculate the p value of chi square statistics, it requires the notion of degree of freedom.
On the other hand, In Bayesian PPP, the degree of freedom is not required, since its definition is the following:

\iint \mathbb{I}[T(y,\theta) >T(y_{obs},\theta) ] p(y|\theta)p(\theta|y_{obs}) \, dy \, d\theta\\

in which there is no terms for the degree of freedom.

Is my interpretation correct? That is, in the Bayesian PPP, the notion of degree of freedom is no longer required.

Note that the above PPP is calculated by

\iint \mathbb{I}[T(y,\theta) >T(y_{obs},\theta) ] p(y|\theta)p(\theta|y_{obs}) \, dy \, d\theta\\ \approx \sum_i \int \mathbb{I}[T(y,\theta_i) >T(y_{obs},\theta_i) ] p(y|\theta_i)\\ \approx \sum_i \sum_j \mathbb{I}[T(y_i^j,\theta_i) >T(y_{obs},\theta_i) ]\\

where \theta_i is MCMC samples from posterior and y_i^j, j=1,2,3,... is a samples drawn from likelihood at parameter \theta_i.

Moreover I think if C is larger, then the above T(y|\theta) is also larger,
and to avoid this, I think I should use the normalizing multiple \frac{1}{C} as follows:

T(y,\theta) := \frac{1}{C} \sum _{c=1,\cdots,C} \frac{\bigl\{y_c- \mathbb{E}[Y_c|\theta ] \}^2}{\mathbb{E}[Y_c|\theta ]}.

If I misunderstand about ppp, then please let me know.

I am glad that these type of methodas are being extended to other platforms. I work with Ed Merkle to add features to blavaan. Have several on going projects, and will share here the papers and methods that we test


The ppp based on the chi-square is the commonly use metric of overall model fit that is presented in Bayesian SEM software. We compared our results to it since it is the standard, but is not what we propose.

The chi-saure ppp compares the observed likelihood and compares it to the model implied likelihood at each iteration of the posterior distribution. This is where this ppp comes from, which ends up behaving like the p-value from the frequentist chi-square.

What we propose is to build posterior distributions for chi-square approximate fit indices. The use of these fit indices is common practice in SEM. Our method shows that under diffuse priors the mean of the fit indices distributions is equivalent to the frequentist one. Adding the benefit of having a full distribution to make inferences from, instead of just the point estimate. While in frequentist SEM only RMSEA presents a confidence interval, ours present credible intervals for every approximate fit index

No one index is perfect, so we present how much each simulation condition affected each to recommend the use of the indices that are less sensitive to model or data characteristic. And use the ones that are mostly sensitive to misfit. Based on our simulation would be Gamma-hat and CFI. The wide of the Credible Interval is still sensitive to sample size.

More research is needed on the use of these indices in other areas, and methods of local misfit evaluation. Which we are looking at for our next projects.

Hope this helps clarify

1 Like

Hi Mauricio,

Thanks for this paper, it’s very helpful!

I’m trying to implement some of these statistics for a model with ordered categorical variables, but I’m a bit stuck with the discrepancy function, and was hoping you could help.

For continuous variables, the discrepancy function is defined as:

F_{ML}=\log|\hat{\Sigma}| - \log|S| + trace(S\hat{\Sigma}^{-1})-p+(\bar{y}-\hat{\mu})^T\hat{\Sigma}^{-1}(\bar{y}-\hat{\mu})

Where \hat{\mu} = \alpha + \Lambda\gamma and \hat{\Sigma} = \Lambda\Psi\Lambda^T + \Theta

In the case of a categorical y, the discrepancy function is defined in terms of the underlying latent Y^*. But in this case, how would I parameterise \hat{\mu}, since the intercepts \alpha are fixed to zero to identify the thresholds?

Thanks again!


The access of how this is worked depends on which program you are looking at. For example, historically as the development of categorical variables models in lavaan, has been always problematic to match the results to Mplus.

My recommendation would be to look at the hidden functions in lavaan, Yves has made this to follow the respective equations and you can match this. The ones I think you are looking is lavaan:::computeMuHat.LISREL

A major issue with the estimation of these discrepancy functions, they need to be the marginal likelihood functions. Which can be tricky to code.

We are working now to include ordered/categorical models into blavaan, having these features. Hard to say when this will be available

Hope this helps


Thanks Mauricio, much appreciated!