Errors in bridge sampling: invoking 'abort' restart when a large stan fit obejct is fed

I am using the bridge_sampler() function to evaluate the log marginal likelihood of the SEM stan fit object. The stan code I used comes below:

"data{
int N; // sample size
int P; // number of variables
matrix[N, P] Y; // data matrix Y with N rows and P columns
int D; // number of total latent variables
int K; // number of explanatory latent variables (ksi)
int E; // number of outcome latent variables (eta)
matrix[K, K] R; // diagonal matrix
}

parameters{
vector[P] nu; // intercepts of observed variables
vector[P-D] lam; // factor loadings
vector[K] beta; // structural regressions
matrix[N, K] FS_K; // factor scores of explanatory latent variables
vector[N] FS_E; // factor scores of outcome latent variables
vector<lower=0>[P] var_P; // variance for observed variables
cov_matrix[K] cov_K; // covariance for explanatory latent variables
vector<lower=0>[E] var_E; // variance for outcome latent variables
}

transformed parameters{
matrix[N, P] mu_pred; // predicted values of observed variables
vector[N] nu_pred; // predicted values of eta
matrix[K, K] L;
L = cholesky_decompose(cov_K);

for(i in 1:N){
mu_pred[i,1] = nu[1] + FS_K[i,1];
mu_pred[i,2] = nu[2] + lam[1]*FS_K[i,1];
mu_pred[i,3] = nu[3] + lam[2]*FS_K[i,1];
mu_pred[i,4] = nu[4] + FS_K[i,2];
mu_pred[i,5] = nu[5] + lam[3]*FS_K[i,2];
mu_pred[i,6] = nu[6] + lam[4]*FS_K[i,2];
mu_pred[i,7] = nu[7] + FS_K[i,3];
mu_pred[i,8] = nu[8] + lam[5]*FS_K[i,3];
mu_pred[i,9] = nu[9] + lam[6]*FS_K[i,3];
mu_pred[i,10] = nu[10] + FS_E[i];
mu_pred[i,11] = nu[11] + lam[7]*FS_E[i];
mu_pred[i,12] = nu[12] + lam[8]*FS_E[i];
}

for(i in 1:N){
  nu_pred[i] = beta[1]*FS_K[i,1]+beta[2]*FS_K[i,2]+beta[3]*FS_K[i,3];
}
}

model{
// hyperpriors
vector[K] u;
u = rep_vector(0, K);

  // priors on intercepts
for (i in 1:P) {nu[i] ~ normal(0, 10);}
  
  // priors on factor loadings
for (i in 1:(P-D)) {lam[i] ~ normal(0, 10);}

// priors on regression coefficients
for (i in 1:K) {beta[i] ~ normal(0, 10);}

// priors on variance
for(i in 1:P) {var_P[i] ~ gamma(1, 0.5);}
var_E ~ gamma(1, 0.5);
cov_K ~ wishart(12, R);

  // likelihood
for(i in 1:N){
FS_E[i] ~ normal(nu_pred[i], var_E);
FS_K[i] ~ multi_normal_cholesky(u, L);
  for(j in 1:P){
    Y[i, j] ~ normal(mu_pred[i,j],var_P[j]);
  }
}
}

The file name of the above code is stan_sem2, and I fitted a stan fit object with the below code.

sem.fit.sc <- stan(file = stan_sem2, data = stan.dat, seed = 322, warmup = 3000, iter = 6000,
                   chains = 2, save_warmup = FALSE)

Next, I fed the Stan fit object to the bridge_sampler() function.

set.seed(322)
bs.sem.fit.sc <- bridge_sampler(sem.fit.sc)

After the processing, I get the following error messages:

bs.sem.fit.sc ← bridge_sampler(sem.fit.sc)
Error: cannot allocate vector of size 1.1 Gb
Error during wrapup: cannot allocate vector of size 918.6 Mb
Error: no more error handlers available (recursive errors?); invoking ‘abort’ restart

The first and second messages should have something to do with the memory issue. What about the third message, then? I have contacted this issue to one of the authors of the bridgesampling package (Quentin), and he said the error message is not from the bridgesampling package. Therefore, I am wondering if this error is related to the rstan. Is there anyone who has ideas about how to solve this error?

Thanks!

I think it’s more useful to fix the memory errors first before looking at that third error, since the memory errors were thrown before that third error. It may be that once you fix the memory issue it no longer causes problems that result in additional errors

1 Like

If you run the model with warmup=1000, iter=2000, do you still get the errors?

1 Like

Thanks, @andrjohns!

  1. I am now running a model with warmup=1000, iter=2000. I will let you know the result, later.
  2. All right, it seems like the memory issues are obstacles to my Stan usage. But, what you mean by the memory error, in this case, RAM memory or SSD memory?

RAM, as R needs to store everything in RAM while it works. If you have a model with a large number of parameters and many samples, you can run into memory issues when post-processing using things like loo and bridgesampling

1 Like

It took overnight to finish the simulation. When I set warmup=1000, iter=2000, first, the stan object was fitted with the following error messages (I know that the first error can be ignored, though)

1: In system(paste(CXX, ARGS), ignore.stdout = TRUE, ignore.stderr = TRUE) :
‘C:/rtools40/usr/mingw_/bin/g++’ not found
2: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
Runtime warnings and convergence problems
3: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
Runtime warnings and convergence problems

When I ran the bridge_sampler function, I had the following error messages:

bs.sem.fit ← bridge_sampler(sem.fit)
Iteration: 1
Iteration: 2
Iteration: 1
Error in jj[2, ] : subscript out of bounds
In addition: Warning messages:
1: Infinite value in iterative scheme, returning NA.
Try rerunning with more samples.
2: logml could not be estimated within maxiter, rerunning with adjusted starting value.
Estimate might be more variable than usual.

So, should it be the case that the RAM memory is not enough?

The bridgesampler warning (“Infinite value in iterative scheme, returning NA.”) indicates what you need to do. You need to use more samples.

The bridgesampler documentation states:

Also note that for testing, the number of posterior samples usually needs to be substantially larger than for estimation.

Usually you want at least an order of magnitude more samples for model comparison than for estimation. The stan warning indicates that 1000 post-warm up samples per chain are not enough for fitting. That means you will need at least 10,000 samples for model comparison (likely quite a bit more more). Your initial warning indicates you do not have enough RAM for 3000 post-warmup samples per chain. That means you will likely need a PC with more RAM to do what you want to do.

bridgesampling can require A LOT of RAM!

1 Like

Thanks for your answer, @Henrik_Singmann!
Now it’s clear that (1) I have indeed need more posterior samples and (2) RAM memory should be super enough. I now have a new machine with a RAM memory of 40 GB. Should it be enough?

If 40 GB RAM is enough depends on the size of data and model. Our recommendation is to run both sampling and bridge sampling at least twice (i.e., obtain bridge sampling estimates from independent sets of posterior samples) to see if the estimates are stable enough. And increase size of posterior samples until the estimates are stable enough given the differences between models.

1 Like

Thanks for your reply, @Henrik_Singmann. I have two additional questions about the bridge_sampler() function.

  1. What is the meaning of this error: Error in jj[2, ] : subscript out of bounds? Can it be ignored?
  2. Increasing the posterior samples and running the bridge_sampler() function is the thing I should do. However, does it mean that all the parameters specified in the parameters{} block should show the sign of convergence, or just the enough and enough posterior samples would be fine? I am wondering about this because, in my model specification, factor scores are also treated as parameters, which should not need to be converged compared to other primary parameters such as factor loadings, regression coefficients, etc.
  1. This can be ignored. It is a bug but only throws if there is a problem in bridge sampling (i.e. you have too few samples). It is nevertheless fixed on the github version of the package.
  2. Please reread the following from the documentation:

Also note that for testing, the number of posterior samples usually needs to be substantially larger than for estimation.

This means that not only all parameters need to have converged, we suggest to have at least an order of magnitude (i.e., ten times more) samples then reaching convergence for all parameters. So you will just need a lot of samples.

1 Like

Clearly understood. Thanks a lot, @Henrik_Singmann!!!