Block-matrix as covariance matrix for multivariate normal

Hi Stanimals,

Suppose I have the following M\times M covariance matrix \mathbf{K} where M=m N
and

\mathbf{K}= \begin{pmatrix} \mathbf{A} & \mathbf{B} & \mathbf{B} & \cdots & \mathbf{B} \\ \mathbf{B} & \mathbf{A} & \mathbf{B} & \cdots & \mathbf{B} \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ \mathbf{B} & \mathbf{B} & \cdots & \mathbf{A}& \mathbf{B} \\ \mathbf{B} & \mathbf{B} & \cdots & \mathbf{B} & \mathbf{A} \\ \end{pmatrix}

here \mathbf{A},\mathbf{B} are both N\times N covariance matrices. Thus \mathbf{K} has on its block-diagonal exactly m-times \mathbf{A} and at off-diagonal block-positions \mathbf{B}.

Now my questions:

a) If I calculate \mathbf{K} in the model block, and both \mathbf{A} and \mathbf{B} depend on variables in the (transformed) parameters block (\mathbf{A}, \mathbf{B}, \mathbf{K} are local variables in the model block) how do I do this best/most efficiently with the goal to do a multivariate normal regression (Gaussian Process) conditioned on the observations in a vector \mathbf{y}. Currently I have something like this

model {
   matrix[N,N] A; 
   matrix[N,N] B;
   // calculate A and B
   matrix[M,M] K;
   matrix[M,M] L;
   for(r in 1:m) {
      for(c in 1:m) {
          for(r_ in 1:N) {
               for(c_ in 1:N) {
                   if(r==c) K[r_+(r-1)*N, c_+(c-1)*N] = A[r_,c_];
                   else K[r_+(r-1)*N, c_+(c-1)*N] = B[r_,c_];
               }
          }
      }
  }
 K += diag_matrix(rep_vector(1E-10,M));
 L = cholesky_decompose(K);
 y ~ multi_normal_cholesky(rep_vector(0,M), L);
}

b) Is there an analytical shortcut to get the Cholesky Factor \mathbf{L} of \mathbf{K}, when it has this special block-structure in terms of \mathbf{A} and \mathbf{B}?

Thank you!

Wikipedia has eqs for a block Cholesky here: https://en.wikipedia.org/wiki/Cholesky_decomposition#Block_variant

They give the equations (using their block definition of A):

D_j = A_{jj} - \sum_{k = 1}^{j - 1} L_{ik} D_k L_{jk}^T \\ L_{ij} = (A_{ij} - \sum_{k = 1}^{j - 1} L_{ik} D_k L_{jk}^T) D_j^{-1}

That doesn’t get you anywhere yet cause you still gotta do tons of linear algebra to do this block form.

If you do the Cholesky of a scalar matrix with the form you gave, something like:

\begin{bmatrix} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 2 \end{bmatrix} = \begin{bmatrix} 1.41 & 0 & 0 \\ 0.71 & 1.22 & 0 \\ 0.71 & 0.41 & 1.15 \\ \end{bmatrix} \begin{bmatrix} 1.41 & 0 & 0 \\ 0.71 & 1.22 & 0 \\ 0.71 & 0.41 & 1.15 \\ \end{bmatrix}^T

There’s a pattern, right? The off-diagonals seem to only depend on the column index.

Maybe we can get somewhere with that. If we substitute your block structure matrices in the eqs. for D_j and L_{ij} we get:

D_j = A - \sum_{k = 1}^{j - 1} L_{ik} D_k L_{jk}^T \\ L_{ij} = (B - \sum_{k = 1}^{j - 1} L_{ik} D_k L_{jk}^T) D_j^{-1}

So for the first entry of D_j and column of L_{ij} we get

D_1 = A\\ L_{i1} = B D^{-1}_1

There’s no row dependence on L_{i1}. That’s cause all your off diagonal Bs are the same. Anyway, if you look at the term:

B - \sum_{k = 1}^{j - 1} L_{ik} D_k L_{jk}^T

in the L_{ij} calculations, you see that things in column j only depend on values of L from column j - 1 and earlier. You know that for the first column there is no row dependence for t_he non-zero blocks of L_{ij}. Because of this (and the way the sum above works), there’s no row dependence in the non-zero blocks of L_{ij} on the 2nd column, etc.

We can explicitly get rid of the row dependence on L_{ij} to make it clearer:

D_j = A - \sum_{k = 1}^{j - 1} L_k D_k L_k^T \\ L_j = (B - \sum_{k = 1}^{j - 1} L_k D_k L_k^T) D_j^{-1}

So that means you only need to do one solve per block column. That should make life quite a bit easier.

In the end that algorithm gives you:

\begin{bmatrix} I & 0 & 0 \\ L_1 & I & 0 \\ L_1 & L_2 & I \end{bmatrix} \begin{bmatrix} D_1 & 0 & 0 \\ 0 & D_2 & 0 \\ 0 & 0 & D_3 \end{bmatrix} \begin{bmatrix} I & 0 & 0 \\ L_1 & I & 0 \\ L_1 & L_2 & I \end{bmatrix}^T

As part of doing those solves with D_j you’ll probably have already computed their Cholesky factors for the solves and you can use that to actually get the factorization of the whole thing. Call the factors Q_j (D_j = Q_j Q_j^T).

\begin{bmatrix} I & 0 & 0 \\ L_1 & I & 0 \\ L_1 & L_2 & I \end{bmatrix} \begin{bmatrix} Q_1 & 0 & 0 \\ 0 & Q_2 & 0 \\ 0 & 0 & Q_3 \end{bmatrix} \begin{bmatrix} Q_1 & 0 & 0 \\ 0 & Q_2 & 0 \\ 0 & 0 & Q_3 \end{bmatrix}^T \begin{bmatrix} I & 0 & 0 \\ L_1 & I & 0 \\ L_1 & L_2 & I \end{bmatrix}^T

So then the product of the left two matrices is the Cholesky factor if your overall big matrix was positive definite (or the thing you’d use for a non-centered parameterization). Hope that’s right at least haha.

Edit: I didn’t double check that this works. Too lazy. Also I don’t know if this is the best solution. So caveat emptor I guess, just to be clear

1 Like

See e.g. matrix inversion lemma and Kronecker product.

Edit: matrix Cholesky lemma is similar to inversion