Hyperprior on block covariances

I have a latent factor model where there are N latent factors on every block. To make this simpler let’s assume there are 2 blocks each with T time points (N < T and J groups. The big matrix is cholesky_factor_cov[2 * T, 2 * N] F. The sub-matrices are \mathbf{A, B, C} and are size and type cholesky_factor_cov[T, L], i.e.,

\begin{align} \mathbf{F}_L &= \begin{bmatrix} \mathbf{L_A} & \mathbf{M } \\ \mathbf{L_B} & \mathbf{L_C} \end{bmatrix} \,\,\,\,\, \scriptstyle{ 2\text{T} \times 2\text{N}} \\ \underset{\mathbf{i} \in {A,B,C}}{\mathbf{L_i}} &= \begin{bmatrix} \mathbf{x_i}^{N \times N}\\ \mathbf{x_i}^{T-N \times N} \\ \end{bmatrix} \,\,\,\,\, \scriptstyle{ \text{T} \times \text{N}} \\ \mathbf{x_i}^{N \times N} &= \begin{bmatrix} 1 & 0 & 0 & \cdots & 0_{N} \\ x_{2,1} & 1 & 0 & \cdots & 0_N \\ \vdots & \ddots & \ddots & \ddots & \vdots \\ x_{N-1,1} & \cdots & x_{N-1,N-2} & 1 & 0_N\\ x_{N,1} & x_{N,2} & \cdots & x_{N, N-1} & 1 \\ \end{bmatrix} \\ \mathbf{x_i}^{T - N \times N} &= \begin{bmatrix} x_{N + 1,1} & x_{N + 1,2} & \cdots & x_{N + 1, N} \\ \vdots & \vdots & \vdots & \vdots \\ x_{T-N, 1} & \cdots & \cdots & x_{T-N, N} \\ \end{bmatrix} \\ \mathbf{M} &= \begin{bmatrix} \mathbf{0}^{N \times N} \\ \mathbf{1^D}^{N \times N} \\ \mathbf{0}^{T - 2N \times N} \end{bmatrix} \,\,\,\,\, \scriptstyle{ \text{T} \times \text{N}} \\ \end{align}

where \mathbf{1^D} is a diagonal matrix with 1s on the diagonal of size N \times N. \mathbf{M} is basically just a pad matrix that maintains the 0s across the first N rows and then a diagonal 1s because the N latent factors for group 1 shouldn’t be correlated with the N latent factors of group 2. The rest of \mathbf{M} are composed of 0s until the group 2 \mathbf{L_C} starts.

As you can see, I’ve constrained the diagonals of the submatrices to be 1. In the code I’ve put priors on the lower triangle of the matrices each with separate normal(mu, sigma) where mu and sigma are dimension size 3. I construct the final cholesky_factor_cov for the multivariate_normal_cholesky by

cholesky_factor_cov[T_two] F_Sigma = cholesky_decompose(add_diag(multiply_lower_tri_self_transpose(F), sigma_y);
Y ~ multi_normal_cholesky(Mu, F_Sigma);

I’m trying to come up with a to hierarchically share information across the blocks of \mathbf{A, B, C}. Right now they each have their own normal. I was thinking something like multivariate priors for hierarchical models in the manual where, in this case, it would be a 2 vector of means and a 2 x 2 covariance matrix. But I’m struggling with how best to share information using this to the covariance block \mathbf{B}. Maybe I’m missing something simple but any suggestions are welcome.

Thanks!

1 Like

Honestly, I tried to think about this for a while and have no good ideas. Maybe @jonah can help?

I haven’t tested this but I think a LKJ prior for the number of blocks could work. So, using the case above where there are 2 blocks (groups). An LKJ prior of size 2 x 2 can be estimated on the blocks and weight the two, ie,

\begin{align} \widehat{\Omega}_{\mathbf{L_B}} &= \theta\rho + (1 - \theta)\Omega_{\widetilde{\mathbf{L_B}}} \\ \Omega_{2 \times 2} &= \begin{bmatrix} 1 & \rho \\ \rho & 1 \\ \end{bmatrix} \end{align}

where \Omega_\widetilde{\mathbf{L_B}} is the strictly lower triangular part of \Omega_\mathbf{L_B} (off the diagonal) and the weighting parameter \theta \in [0,1]. A dummy translation matrix can make \rho into the required shape for the addition.

I took a quick look and I don’t have any brilliant ideas here unfortunately (and don’t currently have the time to do a deep dive, sorry!), but I do think what you’re suggesting with the LKJ could work. @anon75146577 and @bgoodri are both really good with tricky matrix stuff so maybe they have some ideas.