IRT training time

I have a few questions about optimizing this stan model to decrease the training time.

data {
  int<lower=0 > N; // test subjects
  int<lower=0 > K; // test types
  int<lower=0 > J;  // test groups 
  int<lower=0> V[K,N]; // count of positive test results
  int<lower=0> I[N];      // number of tests
  int<lower=1, upper=J> ii[K]; //which test type belongs to which group
}

parameters {
  vector[N] theta; // student overall score
  vector<lower=0>[K] alpha1; //item overall difficulty
  vector[N] gamma[J]; //student score for test group
  vector<lower=0>[K] alpha2;  //difficult w.r.t. group
  vector[K] beta; //prevalence
}

model {
  beta ~ normal(0,1);
  theta ~ normal(0, 1); 
  alpha1 ~ lognormal(1, 1); 
  alpha2 ~ lognormal(1, 1); 
  for (i in 1: J){
    gamma[i] ~ normal(0,1);
  }

  for (i in 1:K) {
     V[i] ~ binomial_logit(I,  alpha1[i]*theta  - beta[i] +  alpha2[i]*gamma[ii[i]]  );
  }
}

The model is intended to represent the bifactor IRT model shown in https://www.ncbi.nlm.nih.gov/pubmed/20954056 and depicted in this image.

image

When running with N=10000, K=1000, J=5 and default inputs for pystan.StanModel.sampling (2000 iterations, NUTS) the training could take weeks to finish on generated data (haven’t finished a run yet)

Any recommendations for transformations or other ways to get orders of magnitude reduction in time? I have tried the ADVI algorithm and it trains much faster but I’m not sure about the results. Here is a plot of the final theta vs truth from the generated data.
image
They seem very highly correlated but with the wrong scale. Is this expected? The alpha1 term has an even more strange relation
image

Once again, using defaults for pystan.StanModel.vb

Thank you for any help.

EDIT: thought i would add info for ADVI training

fit args:

{'random_seed': '42',
 'chain_id': 1,
 'init': b'random',
 'init_radius': 2.0,
 'enable_random_init': False,
 'append_samples': False,
 'method': 'VARIATIONAL',
 'iter': 10000,
 'grad_samples': 1,
 'elbo_samples': 100,
 'eval_elbo': 100,
 'output_samples': 1000,
 'eta': 1.0,
 'adapt_engaged': True,
 'adapt_iter': 50,
 'tol_rel_obj': 0.01,
 'algorithm': 'MEANFIELD'}

Training ELBO
image