Computing Posterior Predictive P-value in CFA

I’m trying to compute Posterior Predictive P-value (PPP) for a simple Factor Analysis ( While I agree that graphical posterior predictive checks would be a lot better option, PPP is widely used in many commercial SEM tools, like MPlus, and I feel a slight pressure to report it. I have actually used an approach proposed by MPlus authors, described here (p. 28): . Instead of their discrepancy function, I’m using sum of the log likelihoods generated by Stan, lpd function. As far as I understand, it should be equivalent, except that higher LL values indicate a better fit, while higher discrepancy values indicate a worse fit.

The problem is that I am getting substantially different results than those from MPlus. PPP is around 0.5 with SD around 0.5, while in MPlus I am getting 0.000 value. The former indicates perfect fit, the latter very bad fit.

Could you confirm that my approach is correct (with respect to the procedure described by MPlus authors)?

Below you will find code, and a short example.

Thank you.

data {
  int n_subj;
  int n_y;
  matrix[n_subj,n_y] y;
  int n_fac;
  int<lower=1,upper=n_fac> y_fac[n_y];
transformed data {
  matrix[n_subj,n_y] scaled_y;
  for (i in 1:n_y) {
    scaled_y[,i] = (y[,i]-mean(y[,i])) / sd(y[,i]);
parameters {
  matrix[n_fac,n_subj] normal01;
  cholesky_factor_corr[n_fac] fac_cor_helper;
  vector[n_y] scaled_y_means;
  vector<lower=0>[n_y] scaled_y_noise;
  vector<lower=0>[n_y] betas;
transformed parameters{
  matrix[n_subj,n_fac] subj_facs ;
  subj_facs = transpose(fac_cor_helper * normal01) ;
  to_vector(normal01) ~ normal(0,1) ;

  fac_cor_helper ~ lkj_corr_cholesky(1) ;

  scaled_y_means ~ normal(0,1) ;

  scaled_y_noise ~ weibull(2,1) ;

  betas ~ normal(0,1) ;
  for(i in 1:n_y){
    scaled_y[,i] ~ normal(
      scaled_y_means[i] + subj_facs[,y_fac[i]] * betas[i]
      , scaled_y_noise[i]
    ) ;
generated quantities {
  vector[n_y] Mu;
  cov_matrix[n_y] Q = diag_matrix(scaled_y_noise .* scaled_y_noise);
  vector[n_y] yrep[n_subj];
  vector[n_subj] log_lik;
  vector[n_subj] log_lik_rep;
  real sum_ll;
  real sum_ll_rep;
  real ppp;
  for (n in 1:n_subj) {
    for (i in 1:n_y) {
      Mu[i] = scaled_y_means[i] + subj_facs[n,y_fac[i]] * betas[i];
   // compute likelihood of the current parameter values (given the original data)
    log_lik[n] = multi_normal_lpdf(scaled_y[n,] | Mu, Q);
    // generate new data based on current parameter values
    yrep[n] = multi_normal_rng(Mu, Q);
    // compute likelihood of the current parameter values (given the new data)
    log_lik_rep[n] = multi_normal_lpdf(yrep[n] | Mu, Q);
  // sum up the likelihoods for all observations
  sum_ll = sum(log_lik);
  sum_ll_rep = sum(log_lik_rep);
  // check which is higher
  ppp = sum_ll > sum_ll_rep ? 1 : 0;
options(mc.cores = parallel::detectCores())

data("HolzingerSwineford1939", package="lavaan")

HolzingerSwineford1939 %>% 
  select(x1:x9) -> dat

data_for_stan = list(
  n_y = ncol(dat),
  n_subj = nrow(dat),
  y = as.matrix(dat),
  n_fac = 3,
  y_fac = c(1,1,1,2,2,2,3,3,3)

model = stan_model(
  file = "cfa.stan"

post = sampling(model,
                data = data_for_stan,
                seed = 1,
                chains = 4,
                iter = 2e3,
                pars = c('normal01','fac_cor_helper','Q','Mu','log_lik_rep','yrep'),
                include = FALSE)

      pars = 'ppp',

Did you ever work this out?

Nope, for the time being I decided to stay with LOO and AIC. However, If you could comment on that issue, that would be great.

Your Stan code looks good to me, so something must be different about the implementation of either the model or the metric over in MPlus.

It’s my understanding that Mplus rescales the PPP to be from 0 -1 with 0 being excellent model fit. I read it from Gelman (I believe) that the usual PPP for a good model fit would be .5, indicating that any discrepancy between the fit statistic based on the model and that which is based on the posterior replications from the model is just chance.

I would love to know if I am correct about this, and also whether this guideline holds for any discrepancy that I might be interested in. That is, if I am interesting in whether my model fits extreme data points and I form the correct discrepancy, then here too if my model fits those extreme points, then the PPP value should be around 0.5.