How can I reduce the runtime of this model?

I am currently trying to run the following in which my data consists of 0 and 1s, with 1000 rows with 80 columns. The runtime is taking quite a bit longer than expected, is there anything I can do to reduce runtime besides lowering the amount of iterations? This is my first time using stan.

data{
  int<lower=0> n_examinee; // Number of examinees
  int<lower=0> n_item; // Number of items
  int<lower=0, upper = 1> Y[n_examinee,n_item]; //The data matrix (0s and 1s)
} 

```stan
parameters{
  vector [n_examinee] theta; //Ability parameters (1 per examinee)
  vector <lower = 0> [n_item] alpha; //Discrimination Parameters >0, 1 per item
  vector [n_item] beta; // Difficulty Parameters, 1 per item
  vector <lower = 0, upper = 1> [n_item] gamma; //Guessing Parameter (0 to 1)
  real mu_beta; //hyperparameter of difficulty
  real <lower = 0> sigma_alpha; //sd of discrim
  real <lower = 0> sigma_beta; //sd of difficulty
} 

```stan
model {
  theta ~ normal(0,1); //standard normal distribution as prior for ability
  beta ~ normal(mu_beta,sigma_beta);  
  mu_beta ~ normal(0,5); //hyperparameter for mu_beta
  sigma_beta ~ cauchy(0,5); //hyperparameter for sd_beta
  alpha ~ lognormal(0,sigma_alpha);
  sigma_alpha ~ cauchy(0,5);
  gamma ~ beta(5,23);

  for(i in 1:n_examinee){
    for (j in 1:n_item){
      real p;
      p = inv_logit(alpha[j]*(theta[i] - beta[j]));
      Y[i,j] ~ bernoulli(gamma[j] + (1-gamma[j])*p);
    }
  }
}


```stan
generated quantities{ 
  vector[n_item] log_lik[n_examinee];
  for (i in 1: n_examinee){
    for (j in 1: n_item){
    real p;
    p= inv_logit(alpha[j]*(theta[i] -beta[j]));
    log_lik[i, j] = bernoulli_log(Y[i, j], gamma[j] + (1-gamma[j])*p); //calculating log likelihood to use for model comparison
    }
  }
}

please have a look into what it means to vecotrize your code. Basically, you need to create a big vector of what you pass into the bernoulli statement and then have ideally just a single call to bernoulli. As an intermediate step you can do this in a row-wise way. The best solution here is probably to make Y into a long format and then create a long vector which is filled with the respective terms…then you can have a single Y_long ~ bernoulli(long) statement. That will run a gazillon times faster.

Thank you for your response. So if I were to make Y into long format, it would resemble this where for each item, i would have all responses, followed by item 2, etc.

Item     Response
1            0
1            1
1            1
1            1
1            0
1            1
...          ...
2            1
2            0

I am confused about the “create long vector which is filled with respective terms”. And how this would be used in conjunction with the long formatted data.

Currently figured it out (i think) so will compare to see if results are the same as the non vectorized version.

So things seem to be working to some extent, but is there a way i can run the unvectorized and vectorized models to make sure they are actually the same? I tried setting the seed within the stan() argument and running the old stan model along with the new one with the same data and priors, but the estimates are still different (albeit not by a ton, but still a meaningful amount at times).

They run numerically slightly different such that the same seed won’t lead to exactly the same numbers… it’s up to mcmc sampling error the same if coded correctly…

Thanks. Last question would be that when i run my model through STAN, the standard error i receive is always very low compared to running regular reputable packages for my model that are used in my field and common sense based on the parameters. Looking at the STAN manual, i see that the SE is already calculated for me in the summary output, which seems to be the standard deviation of the draws divided by the square root of the effective sample size.

Is there some other way to calculate standard error for the resulting parameter estimate?

Sometimes the outputs conflate mcse with the standard error. The standard error is often called sd and the mcse is called se ( sd divided by square root of ess).

The code is still taking quite some time to run :(

This is what i am using now

data {
  int<lower=1> J;              // number of students
  int<lower=1> K;              // number of questions
  int<lower=1> N;              // number of observations
  int<lower=1,upper=J> jj[N];  // student for observation n
  int<lower=1,upper=K> kk[N];  // question for observation n
  int<lower=0,upper=1> y[N];   // correctness for observation n
}
parameters{
  vector[J] theta;             // theta - alpha
  vector[K] beta;              // difficulty for k
  vector<lower=0>[K] alpha;    // discrimination of k
  vector <lower = 0, upper = 1> [K] gamma; //Guessing Parameter (0 to 1)
  real<lower=0> sigma_beta;    // scale of difficulties
  real<lower=0> sigma_alpha;   // scale of log discrimination
  real mu_beta; //hyperparameter of difficulty

}
model { //specify model and priors (uniform assumed if not specified)
  theta ~ normal(0,1); //standard normal distribution as prior for ability
  beta ~ normal(mu_beta,sigma_beta);  //normal distribution for difficulty with unknown mean and sd(mu_beta,sigma_beta)
  mu_beta ~ normal(0,5); //hyperparameter for mu_beta
  sigma_beta ~ cauchy(0,5); //hyperparameter for sd_beta
  alpha ~ lognormal(0,sigma_alpha);
  sigma_alpha ~ cauchy(0,5);
  gamma ~ beta(5,23);
  
  y ~ bernoulli_logit(gamma[kk] + (1-gamma[kk]) .* alpha[kk] .* (theta[jj] - beta[kk]));
  //all priors and hyperpriors above have been specified.
  // for(i in 1:n_examinee){
  //   for (j in 1:n_item){
  //     real p;
  //     p= inv_logit(alpha[j]*(theta[i] - beta[j]));
  //     Y[i,j] ~ bernoulli(gamma[j] + (1-gamma[j])*p);
  //   }
  // }
}

This is what it was before

data {
  int<lower=0> n_examinee; // Number of examinees
  int<lower=0> n_item; // Number of items
  int<lower=0, upper = 1> Y[n_examinee,n_item]; //The data matrix (0s and 1s)
}
parameters{
  vector [n_examinee] theta; //Ability parameters (1 per examinee)
  vector <lower = 0> [n_item] alpha; //Discrimination Parameters >0, 1 per item
  vector [n_item] beta; // Difficulty Parameters, 1 per item
  vector <lower = 0, upper = 1> [n_item] gamma; //Guessing Parameter (0 to 1)
  real mu_beta; //hyperparameter of difficulty
  real <lower = 0> sigma_alpha; //sd of discrim
  real <lower = 0> sigma_beta; //sd of difficulty
}
model { //specify model and priors (uniform assumed if not specified)
  theta ~ normal(0,1); //standard normal distribution as prior for ability
  beta ~ normal(mu_beta,sigma_beta);  //normal distribution for difficulty with unknown mean and sd(mu_beta,sigma_beta)
  mu_beta ~ normal(0,5); //hyperparameter for mu_beta
  sigma_beta ~ cauchy(0,5); //hyperparameter for sd_beta
  alpha ~ lognormal(0,sigma_alpha);
  sigma_alpha ~ cauchy(0,5);
  gamma ~ beta(5,23);
  
  
  
  //all priors and hyperpriors above have been specified.
  for(i in 1:n_examinee){
    for (j in 1:n_item){
      ///real p;
      ///p= inv_logit(alpha[j]*(theta[i] - beta[j]));
      ///Y[i,j] ~ bernoulli(gamma[j] + (1-gamma[j])*p);
      Y[i,j] ~ bernoulli_logit(gamma[j] + (1-gamma[j]) * (alpha[j] * (theta[i] - beta[j])));
    }
  }
}

I am assuming i vectorized it correctly but not 100% sure if there is something left to do besides just deal with the runtime. For example, when number of students = 1000, and number of questions = 80, it take a few hours.

Try to use bernoulli_logit_glm, see 15.3 Bernoulli-logit generalized linear model (Logistic Regression) | Stan Functions Reference

and if that is still too slow, then you can use reduce_sum based within-chain parallelisation. This way you can use multiple CPU cores for one chain. For that I recommend to create a somewhat similar model with brms and then use that as a basis for your model, see Running brms models with within-chain parallelization