Fitting a Bayesian Factor Analysis Model in Stan

I learned about Stan a couple days ago. I’m using R version 4.0.2 and RStudio 1.3.1056 on macOS Catalina version 10.15.6. I have installed the rstan package version 2.21.2.

Issue I want to reproduce the results from a GitHub page dated from 2015 and titled " Fitting a Bayesian Factor Analysis Model in Stan" (found here).
The authors have supplied code, but some of the code does not work on my machine.

data {
  int<lower=1> N;                // number of 
  int<lower=1> P;                // number of 
  matrix[N,P] Y;                 // data matrix of order [N,P]
  int<lower=1> D;              // number of latent dimensions 
}
transformed data {
  int<lower=1> M;
  vector[P] mu;
  M  <- D*(P-D)+ D*(D-1)/2;  // number of non-zero loadings
  mu <- rep_vector(0.0,P);
}
parameters {    
  vector[M] L_t;   // lower diagonal elements of L
  vector<lower=0>[D] L_d;   // lower diagonal elements of L
  vector<lower=0>[P] psi;         // vector of variances
  real<lower=0>   mu_psi;
  real<lower=0>  sigma_psi;
  real   mu_lt;
  real<lower=0>  sigma_lt;
}
transformed parameters{
  cholesky_factor_cov[P,D] L;  //lower triangular factor loadings Matrix 
  cov_matrix[P] Q;   //Covariance mat
{
  int idx1;
  int idx2;
  real zero; 
  zero <- 0;
  for(i in 1:P){
    for(j in (i+1):D){
      idx1 <- idx1 + 1;
      L[i,j] <- zero; //constrain the upper triangular elements to zero 
    }
  }
  for (j in 1:D) {
      L[j,j] <- L_d[j];
    for (i in (j+1):P) {
      idx2 <- idx2 + 1;
      L[i,j] <- L_t[idx2];
    } 
  }
} 
Q<-L*L'+diag_matrix(psi); 
}
model {
// the hyperpriors 
   mu_psi ~ cauchy(0, 1);
   sigma_psi ~ cauchy(0,1);
   mu_lt ~ cauchy(0, 1);
   sigma_lt ~ cauchy(0,1);
// the priors 
  L_d ~ cauchy(0,3);
  L_t ~ cauchy(mu_lt,sigma_lt);
  psi ~ cauchy(mu_psi,sigma_psi);
//The likelihood
for( j in 1:N)
    Y[j] ~ multi_normal(mu,Q); 
}

Here is the R code up to the problematic part.

library("rstan")
library("parallel")

fa.data <-list(P=P,N=N,Y=Y,D=D)

# a function to generate intial values that are slightly jittered for each chain.
init_fun = function() {
  init.values<-list(L_t=rep(0,24)+runif(1,-.1,.1),
                    L_d=rep(.5,D)+runif(1,-.1,.1),
                    psi=rep(.2,P)+runif(1,-.1,.1),
                    sigma_psi=0.15+runif(1,-.1,.1),
                    mu_psi=0.2++runif(1,-.1,.1),
                    sigma_lt=0.5+runif(1,-.1,.1),
                    mu_lt=0.0+runif(1,-.1,.1))
  return(init.values); 
} 

#compile the model
fa.model<- stan("fa.stan", 
                  data = fa.data,
                  chains =0, 
                  pars=c("L","psi","sigma_psi","mu_psi","sigma_lt","mu_lt"))

I receive the following messages after running the code:

DIAGNOSTIC(S) FROM PARSER:
Info: assignment operator <- deprecated in the Stan language; use = instead.
Info: integer division implicitly rounds to integer. Found int division: D * D - 1 / 2
 Positive values rounded down, negative values rounded up or down in platform-dependent way.
.
.
.
the number of chains is less than 1; sampling not done
Warning message:
In readLines(file, warn = TRUE) :
  incomplete final line found on '...fa.stan'

So I changed updated every case of <- to =. I added a blank line to the fa.stan file at the very end. I also changed chains=0 to chains=4. Running the code after making these changes gave me the following messages:

SAMPLING FOR MODEL 'fa' NOW (CHAIN 1).
Chain 1: Unrecoverable error evaluating the log probability at the initial value.
Chain 1: Exception: []: accessing element out of range. index -2147483647 out of range; expecting index to be between 1 and 24; index position = 1L_t  (in 'model8d30668ceee6_fa' at line 40)

[1] "Error in sampler$call_sampler(args_list[[i]]) : "                                                                                                                                      
[2] "  Exception: []: accessing element out of range. index -2147483647 out of range; expecting index to be between 1 and 24; index position = 1L_t  (in 'model8d30668ceee6_fa' at line 40)"
error occurred during calling the sampler; sampling not done
.
.
.
[[1]]
Stan model 'fa' does not contain samples.

[[2]]
Stan model 'fa' does not contain samples.

[[3]]
Stan model 'fa' does not contain samples.

[[4]]
Stan model 'fa' does not contain samples.

Warning message:
In .local(object, ...) :
  some chains had errors; consider specifying chains = 1 to debug

So I changed chains=4 to chains=1 , re-ran the code and received this message:

.
.
.
SAMPLING FOR MODEL 'fa' NOW (CHAIN 1).
Chain 1: Unrecoverable error evaluating the log probability at the initial value.
Chain 1: Exception: []: accessing element out of range. index -2147483647 out of range; expecting index to be between 1 and 24; index position = 1L_t  (in 'model8d30668ceee6_fa' at line 40)

[1] "Error in sampler$call_sampler(args_list[[i]]) : "                                                                                                                                      
[2] "  Exception: []: accessing element out of range. index -2147483647 out of range; expecting index to be between 1 and 24; index position = 1L_t  (in 'model8d30668ceee6_fa' at line 40)"
error occurred during calling the sampler; sampling not done

The author of the page was able to get the code to run somehow. I don’t understand how to properly address any of the messages I was given. I am hoping that someone can help me get this code working.

Hi Anthony,

The error:

Exception: []: accessing element out of range. index -2147483647

Indicates that the index being used to access an element of a container is larger than the dimensions of a given container. The size of the index: index -2147483647, indicates that the index variable is likely not properly initialised. We can see that occurring with this stan code because of the variables: idx1 and idx2:

{
  int idx1;
  int idx2;
  real zero; 
  zero <- 0;
  for(i in 1:P){
    for(j in (i+1):D){
      idx1 <- idx1 + 1;
      L[i,j] <- zero; //constrain the upper triangular elements to zero 
    }
  }
  for (j in 1:D) {
      L[j,j] <- L_d[j];
    for (i in (j+1):P) {
      idx2 <- idx2 + 1;
      L[i,j] <- L_t[idx2];
    } 
  }
} 

When declaring a new variable in Stan, it automatically gets initialised to nan, which for integers looks a lot like: -2147483647. In the Stan code, both idx1 and idx2 are declared without assigning any values to them, so they get initialised to that large negative number. When the indexes get incremented:

      idx2 <- idx2 + 1;

This is just adding 1 to the large negative number, so the index continues to stay way out of bounds. I believe you need to initialise the indexes to zero:

  int idx1 = 0;
  int idx2 = 0;

But I haven’t looked at the model too closely, so double-check that that would be right

1 Like

Wow… I spent a lot of time verifying whether the code inside the for-loops made sense and didn’t once think about that. I’ve made the change and have attempted to re-run the code. I’m not receiving any of those messages and it seems like it is in the process of sampling.

Many thanks for the speedy response! Will update once/if the run completes.

1 Like

That seemed to do the trick! I was able to run the rest of the code from the GitHub page. The results differ greatly, but I may need to rerun everything after a fresh restart of RStudio. Thanks again!