Count data model with MI basis functions

Hello everyone. I am relatively new and have limited knowledge of STAN. I am trying to work on the following model:
Z|Y~Poisson(exp(Y))
Y=X*\beta+M*\eta
M is the propagator matrix obtained from the MI operator as defined in Hughes and Haran(2013) and \eta are the random effects with mean zero and covariance matrix K.
I wrote the stan code as

data {
int<lower = 1> n_obs;
int<lower = 1> n_cov;
int<lower = 1> r;
int<lower = 0> z[n_obs];
matrix [n_obs,n_cov] x; // the covariate matrix
matrix [r,r] K;
matrix [n_obs,r] M;
vector[r] V;
}
transformed data{
vector[r] zeros;
zeros = rep_vector(0, r);
}
parameters {
vector[n_cov] beta;
vector[r] v;
}
transformed parameters{
vector[n_obs] eta;
eta = M*v;
}
model
{
z ~ poisson_log(x*beta+eta);
beta~ normal(0,10);
v ~ multi_normal(zeros,K);
}

Now in order to run this model, I was using a warmup of 2000 iterations. Then for sampling, a burnin of 20000 and 80000 iterations. The issue here is the code takes forever to run(upwards of 17 hours). Could anyone suggest any modifications to be done in order to speed up the process? I have adjusted the maximum treedepth to 15 and more but the execution time has increased if anything,and the chains aren’t even mixing.
Thank you.

Since K is data, you can speed this up by doing a Cholesky factorization in transformed data

transformed data {
  matrix[r, r] L = cholesky_decompose(K);
  ...
}

And then you could just make the code you have faster with:

v ~ multi_normal_cholesky(zeros, L);

But it’ll be faster if you make v a transformed parameter based on a new parameter v_z like so:

transformed data {
  matrix[r, r] L = cholesky_decompose(K);
  ...
}

parameters {
  vector[r] v_z;
  ...

}

transformed parameter {
  vector[r] v = L * v_z;
  ...
}

model {
  v_z ~ normal(0, 1);
}

This implies a prior of v \sim N(0, K) and takes less linear algebra. And then you could premultipy M and L together and skip the intermediary v variable (and only have one matrix vector product instead of two)

2 Likes

@bbbales2 Thank you. I meant to get back earlier. With the modifications you suggested, the process has sped up a whole lot. It takes about 5-6 hours down from 17+ hours, which I am guessing has to do with the size of the model. So, thank you again.

1 Like

Cool beans, good to hear. By the way, you can probably use less samples. Running chains of length 10k+ seems excessive.

Try a few chains of default (1000 warmup/1000 sampling) and see if there’s much of a difference in the results of your analysis (and check Neff/Rhat).

I had initially used 8K iterations with warmup of 2K for a simpler version of the model. But the Chains didn’t mix well. I increased it to over 10K for this model given the fact that the model has more components to it. Since it takes such a long time, I am usually apprehensive of waiting and try with smaller number of iterations and then if the chains didn’t mix, increase the number of iterations. In this model even, depending upon the dimensions of K, the sampling time varies a lot. I was initially working with K being 4x4 matrix and it took around 5-6 hours, but as I have now increased the dimensions(34x34), it is again taking around 15-16 hours.

Oh, interesting. Are your parameters super correlated in the posterior?

If so, have you tried control(metric = "dense")? Sometimes works, sometimes doesn’t.