On an observation date t in 1,...,T, I have \theta_t, a latent Jdimensional vector drawn from a multivariate normal:
\theta_t \sim N(\mu, \Sigma)
On date t, I make N_t observations of \theta_t, say y_t := (y_{t,i},...,y_{t,N_t}) with i.i.d. noise \tau^2:
y_{t,i} \sim \theta_{t, j(t, N_t)} + N(0, \tau^2)
for a design function j(t,i) that dictates which component of \theta_t is observed.
See model.stan
for an implementation of this. The reason I’m posting is that this program becomes slow as T and J get larger. I’m interested in T\approx 1000, J\approx 20, N_t \approx 40 which runs on the order of an hour for 500 samples. Not bad but I’d like it to be faster as I’ll be adding regression and autoregression components (turning it into a DLM a la gaussian_dlm_obs
but with timevarying design). And I need to run it thousands of times for various experiments.
Conversation with @rtrangucci led me to realize I am unnecessarily sampling tens of thousands of theta parameters and so I wanted to try speeding it up by integrating them out.
Mathematically it’s straightforward. I am writing the design function j(t, \cdot) as an J \times N_t matrix F_t so that we have
y_t \sim F_t\theta_t + N(0, \tau^2 I_{N_t})
Then I can get rid of the thetas:
y_t \sim N(F_t\mu_t, F_t \Sigma F_t' + \tau^2 I_{N_t})
Implementing it in Stan has been less easy. I’ve tried two approaches:
For loop approach
Because y_t has varying dimension N_t I am writing this in Stan by stacking the y_t and F_t in a big vector y and matrix F respectively of length N = \sum_t N_t and then doing:
...
int pos;
model {
for (i in 1:T) {
segment(y, pos, n[i]) ~ multi_normal(
block(F, 1, pos, J, n[i])' * mu,
quad_form(Sigma, block(F, 1, pos, J, n[i]) ) +
diag_matrix(rep_vector(tau, n[i])));
pos += n[i];
}
}
The full code is attached as model_no_theta_irreg.stan
. For my dimensions (see above), this turns out to be basically no faster than estimating all the thetas!
Big covariance matrix approach
I also tried modeling the big vector y with a huge covariance structure. I offset and padded with zeros each F_t matrix to be J \times N so that I could write
y \sim N(F\mu, \Sigma_{\text{full}})
with a big N\times N block diagonal covariance given by
\Sigma_{\text{full}} = N(0, \tau^2 I_N) + \sum_t \tilde F_t \Sigma {\tilde F_t}'
This code is attached in model_no_theta_irreg2.stan
(Note that the augmented F_t are called F2
in that code). The big covariance matrix approach was too slow to even run (gradient evaluation took 3 seconds).
Questions
Any ideas for speeding up either of these approaches? I hope I’m missing something easy. I was thinking of the following:

I know the for loop is bad, but I’m not sure how to avoid it. Should I be worried about the other bits in there like
segment
,block
anddiag_matrix
? They are potentially avoidable using more clever data structures. 
In practice, even though T\approx 1000 in my currently example, there are actually only say 50 different design matrices F_t so I could shrink the loop to 50 iterations, one for each unique
F_t
. It occurs to me as I write this that this is likely my best option. I’d still appreciate feedback on the efficiency of functionsblock
andsegment
andto_matrix
in sampling statement, which I’d probably still end up using here? 
Switch to sparse (CRS) matrices for \tilde F_t in the big covariance matrix approach?

… try RINLA?
model.stan (997 Bytes)
model_no_theta_irreg.stan (1.0 KB)
model_no_theta_irreg2.stan (983 Bytes)