You probably don’t need initis at all for Stan. Gibbs (or at least BUGS and JAGS) are a lot more sensitive to where they’re initialized.
Do you really need 20K warmup iterations?
You probably also don’t need to save all those transformed parameters—couldn’t they be defined as local variables in the model block?
Also, it appears you’re creating diagonal matrices (for which there is better syntax), but that’s rarely a good idea computationally. For example, we have quadratic forms specifically for diagonal vectors. it’s also much much better to work with Cholesky factors wherever possible, but I don’t know enough about these models or have enough time to sort out whether that’s possible.
The main thing you need to do for efficiency in Stan is to vectorize operations into matrix operations (you do a lot of them as loops) and also save repeated subexpressions. For example, in
(xbeta[t] + mode[t]) - exp(xbeta[t] + mode[t]) you want to only compute
xbeta + mode once. Every subexpression adds memory allocation and extra time to the reverse-mode derivative calculations.
I’m also not sure if you are going to run into places where the function you’re definining with conditionals isn’t continuously differentiable—this can make it very hard for HMC to move across the conditional boundary. It would most likely manifest as divergences when you run or underflow/overflow warnings.
It’s rarely efficient to explicitly create a diagonal matrix. I started looking at
common_functions.stan, but there’s just too much in there that can be made faster for me to do a thorough review.
I don’t know why there’s a negative in your (log) Jacobian calculation—it’s just
sum(theta) if you want to recover what’s going on under the hood because the transform is just
exp(theta) to go from unconstrained to constrained.
This kind of thing can be bad news:
approx_var_y was calculated on a linear scale and then logged—it’s usually much more stable to compute
approx_var_y on the log scale from the get go as
log_approx_var_y = -(xbeta + mode);
rather than using
approx_var_y = 1.0 ./ exp(xbeta + mode[1:n]);
and then taking a log. Also, you don’t need all those
[1:n] if the structure is
n elements long—that just does an unneeded copy (I probably should have checked for this degenerate case in the implementation as it’s only an integer comparison and not left it as a user optimization).
No idea how much efficiency improvement you could get by coding the common functions more efficiently.