Multi normal pdf with low rank covariance matrix

In my understanding, the Stan function for computing the log density

\log \text{MultiNormal}(y \mid \mu, \Sigma),

i.e., multi_normal_lpdf, requires inversion of the n \times n matrix \Sigma, which is an O(n^3) operation. What about having also a function multi_normal_structured_lpdf or something which could be used if the user knows that

\Sigma = A + U C V


  • A is diagonal n \times n
  • U is n \times m
  • C is m \times m
  • V is m \times n

with m < n? The complexity would then be O(m^2 n) (I think) if the Woodbury formula is used in the underlying implementation to invert \Sigma.

Yes, multi_normal_lpdf requires an inversion (technically, we don’t invert, we use matrix division for more stabilty). If there are multiple y (e.g., an array of vectors), the the solve only happens once.

multi_normal_precision_lpdf does not require an inversion if you can work with precision matrices instead of covariance. Also, multi_normal_cholesky_lpdf works with a Cholesky factor rather than the full matrix, which makes inversion easy. And it’s a much more stable parameterization if the covariance matrix is a parameter. But obviously it won’t work in something like a GP, where we first build the covariance matrix, then factor it.

That shouldn’t be too challenging. It only requires working out derivatives of inverse(A + U * C * V) to be efficient. As with other additions to Stan, it’s a matter of motivating someone to do it.

When does this structure of covariance matrix come up? Will that structure give us something full rank? As is, we require positive-definite covariance and can’t deal with only positive semi-definite covariance matrices.


Yes, in factor models with positive diagonal elements in \mathbf{A}.

I have been doing GPs so that L = cholesky_decompose(Ky) and then y ~ multi_normal_cholesky(mu, L), but I am not sure if it has much advantage over just y ~ multi_normal(mu, Ky).

Yeah, maybe I will give it a shot myself if I have time and no one else has done it by then.

It comes up with GPs if you do a low-rank approximation for the kernel matrix K \approx \tilde{K} = U \Lambda U^T and then you have to compute

\text{MultiNormal}(y \mid \mu, \Sigma)

where \Sigma = \tilde{K} + \sigma^2 I. So \tilde{K} is not full rank but \Sigma is so it is not a problem.


There’s no advantage if you’re only calling multi_normal_cholesky once. The multi-normal does the decomposition internally.

1 Like