Multi normal pdf with low rank covariance matrix

jtimonen · February 10, 2020, 8:12pm

In my understanding, the Stan function for computing the log density

\log \text{MultiNormal}(y \mid \mu, \Sigma),

i.e., multi_normal_lpdf, requires inversion of the n \times n matrix \Sigma, which is an O(n^3) operation. What about having also a function multi_normal_structured_lpdf or something which could be used if the user knows that

\Sigma = A + U C V

where

A is diagonal n \times n
U is n \times m
C is m \times m
V is m \times n

with m < n? The complexity would then be O(m^2 n) (I think) if the Woodbury formula is used in the underlying implementation to invert \Sigma.

Bob_Carpenter · February 10, 2020, 8:20pm

Yes, multi_normal_lpdf requires an inversion (technically, we don’t invert, we use matrix division for more stabilty). If there are multiple y (e.g., an array of vectors), the the solve only happens once.

multi_normal_precision_lpdf does not require an inversion if you can work with precision matrices instead of covariance. Also, multi_normal_cholesky_lpdf works with a Cholesky factor rather than the full matrix, which makes inversion easy. And it’s a much more stable parameterization if the covariance matrix is a parameter. But obviously it won’t work in something like a GP, where we first build the covariance matrix, then factor it.

That shouldn’t be too challenging. It only requires working out derivatives of inverse(A + U * C * V) to be efficient. As with other additions to Stan, it’s a matter of motivating someone to do it.

When does this structure of covariance matrix come up? Will that structure give us something full rank? As is, we require positive-definite covariance and can’t deal with only positive semi-definite covariance matrices.

bgoodri · February 10, 2020, 8:32pm

Yes, in factor models with positive diagonal elements in \mathbf{A}.

jtimonen · February 10, 2020, 10:07pm

I have been doing GPs so that L = cholesky_decompose(Ky) and then y ~ multi_normal_cholesky(mu, L), but I am not sure if it has much advantage over just y ~ multi_normal(mu, Ky).

Yeah, maybe I will give it a shot myself if I have time and no one else has done it by then.

It comes up with GPs if you do a low-rank approximation for the kernel matrix K \approx \tilde{K} = U \Lambda U^T and then you have to compute

\text{MultiNormal}(y \mid \mu, \Sigma)

where \Sigma = \tilde{K} + \sigma^2 I. So \tilde{K} is not full rank but \Sigma is so it is not a problem.

Bob_Carpenter · February 13, 2020, 8:27pm

There’s no advantage if you’re only calling multi_normal_cholesky once. The multi-normal does the decomposition internally.

Topic		Replies	Views
Multivariate normal sufficient statistics + exposing mdivide_ldlt Modeling	2	352	July 8, 2019
Density functions but not for incrementing log-likelihood General	13	1053	August 26, 2017
What is the point of the multi_normal_cholesky parametrization? Modeling	9	169	February 3, 2025
Multivariate normal data covariance estimation Modeling performance , covariance , multivariate-normal	3	1449	May 10, 2020
Symmetric covariance matrix causes exception in multi_normal_lpdf for being non-symmetric Modeling	4	821	July 20, 2021

Multi normal pdf with low rank covariance matrix

Related topics