Faster CFA without data augmentation

edm · November 2, 2022, 2:12am

I wrote a script (below) to do a basic comparison of the two models using rstan 2.21.7. There are also links to the two models, which are minor modifications of the original models (without the suggestion of @spinkney). I am seeing speed and effective sample size advantages of marginal at smaller N. Speed gets similar at larger N, but effective sample size of marginal is still much larger than that of data augmentation.

The trick from @simonbrauer is worthwhile; blavaan also does that (since the past year or so) and it really helps for large N. If you need the constants of the mvn likelihood for that approach, see the Stan code in this post.

The comparison R script is below. You can play with it by changing the values of N,M, and pper. I’d be interested to hear if your results are different from mine.

tr_cond.stan (1.8 KB)
tr_marg.stan (1.9 KB)

library(rstan)
library(mvtnorm)

N <- c(100, 2000)
M <- c(3, 9)
pper <- 3 ## indicators per factor

conds <- expand.grid(N = N, M = M, pper = pper)

mmarg <- stan_model('tr_marg.stan')
mcond <- stan_model('tr_cond.stan')

tmpR <- matrix(.4, max(pper), max(pper))
diag(tmpR) <- 1

res <- vector("list", nrow(conds))

savepars <- c("Dvec", "psi", "delta", "L_Omega") ## note: "d" in the marginal model is renamed
                                                 ##       "delta"

for(i in 1:nrow(conds)) {
  N <- conds[i,'N']
  M <- conds[i,'M']
  pper <- conds[i,'pper']

  datmat <- matrix(NA, N, M * pper)
  Dskel <- diag(M) %x% matrix(1, pper, 1)
  
  for(j in 1:M) {
    datmat[, (pper * (j-1) + 1):(pper * j)] <- rmvnorm(N, sigma = tmpR[1:pper, 1:pper])
  }
  
  t1 <- system.time(r1 <- sampling(mmarg, data = list(N = N, M = M, P = pper * M, Ind = datmat,
                                                      D_skeleton = Dskel, Dvec_length = sum(Dskel)),
                                   warmup = 500, iter = 1000, pars = savepars))

  t2 <- system.time(r2 <- sampling(mcond, data = list(N = N, M = M, P = pper * M, Ind = datmat,
                                                      D_skeleton = Dskel, Dvec_length = sum(Dskel)),
                                   warmup = 500, iter = 1000, pars = savepars))

  res[[i]] <- list(r1 = summary(r1)[[1]], r2 = summary(r2)[[1]], t1 = t1, t2 = t2)
}

Topic		Replies	Views
Speeding up construction of linear predictor Modeling techniques , specification , performance	0	384	January 27, 2023
How can I make my script more efficient? Depending on the size of the data matrix it can take a while to run Modeling techniques , specification , performance	3	478	April 21, 2023
Improving efficiency when modeling autocorrelation Modeling performance	8	1733	October 21, 2019
Speedup of independent factor analysis model Modeling performance	4	874	August 23, 2018
Speeding up hierarchical VAR Modeling performance	9	1114	May 18, 2021

Faster CFA without data augmentation

Related topics