Efficiently sample a collection of multi-normal variables with varying sigma (covariance) matrix

I’m new to Stan, so hoping you can point me in the right direction. I’ll build up to my situation to make sure we’re on the same page…

If I had a collection of univariate normals, the docs tell me that:

y ~ normal(mu_vec, sigma);

provides the same model as the unvectorized version:

for (n in 1:N)
    y[n] ~ normal(mu_vec[n], sigma);

but that the vectorized version is (much?) faster. Ok, fine, makes good sense.

So the first question is: is it possible to take advantage of this vectorization speedup in the univariate normal case where both the mu and sigma of the samples vary by position in the vector. I.e. if both mu_vec and sigma_vec are vectors (in the previous case sigma was a scalar), then is this:

y ~ normal(mu_vec, sigma_vec);

equivalent to this:

for (n in 1:N)
    y[n] ~ normal(mu_vec[n], sigma_vec[n]);

and if so is there a comparable speedup?

Ok. That’s the warmup. The real question is how to best approach the multi-variate equivalent of the above.

In my particular case, I have N of observations of bivariate data for some variable y, which I store in an N x 2 matrix. (For order of magnitude, N is about 1000 in my use case.)

My belief is that the mean of each component of each observation is 0 and that the stdev of each component is each observation is 1 (and I’m happy to hard code them as such). However, my belief is that the correlation (rho) varies from observation to observation as a (simple) function of another observed variable, x (stored in an N element vector). For example, we might say that rho[n] = inverse_logit(beta * x[n]) for n in 1:N and our goal is to learn about beta from our data. I.e. the covariance matrix for the nth observation would be:

[1,      rho[n]]
[rho[n], 1     ]

My question is what’s the best way to put this together in a STAN model so that it isn’t slow as heck? Is there a vectorized version of the multi_normal distribution so that I could specify this as:

y ~ multi_normal(vector_of_mu_2_tuples, vector_of_sigma_matrices)

or perhaps as some other similar formulation? Or will I need to write:

for (n in 1:N)
    y[n] ~ multi_normal(vector_of_mu_2_tuples[n], vector_of_sigma_matrices[n])

after having set up vector_of_sigma_matrices and vector_of_mu_2_tuples in an earlier block?

Thanks in advance for any guidance!

your first question about univariate Gaussian variables is a definite yes.

In general, it is ideal if you can avoid multi_normal as far as I am aware. The way to do this is to write

y = L * x;

...

x ~ normal(0,1); 

where L is either the cholesky factor or a matrix square root of the covariance matrix.

If one were deadset on vectorization, you could stack your y's in a 2N \times 1 vector, in which case the covariance matrix will be block diagonal with 2\times 2 blocks. Both the matrix square root and the cholesky factor of this block diagonal matrix would simply be the appropriate factor/root of each individual block. Since the blocks are so small, you can likely calculate both the root and the factor analytically.

The problem with this is that you need to instantiate a very big sparse matrix which is pretty bad. So unless the developers have some nifty function that allows you to operate with the block-diagonal matrix efficiently, a loop is likely the way to go. I would still do the reparameterization though.

There’s not much use in vectorizing over vectors of covariance matrices as all the work is basically done in solving those. The rest of the overhead is fairly minimal compared to the vectorization.

What would be most efficient is to write out the bivariate normal likelihood directly in terms of rho[n], as it only involves a 2x2 determinant. That’d probalby be faster, but then I’m not 100% sure of our multivariate normal implementation details.

What we like to use in cases like this would be a Cholesky factor of a correlation matrix. That’ll lead to a bit faster multi-normal code. And it’s definitely what to do in higher dimensions than two.

1 Like