Weighted correlation matrices and cholesky_factor_corr

spinkney · May 3, 2020, 3:38pm

Way back @bgoodri was referenced in @James_Savage post about hierarchical vars at https://khakieconomics.github.io/2016/11/27/Hierarchical-priors-for-panel-vector-autoregressions.html, concerning estimation of a weighted correlation matrix between global and local correlation matrices,

\Omega = \rho \Omega_{\text{Global}} + (1 - \rho) \Omega_{\text{Local}}

with

\begin{align} &\Omega_{\text{Global}} \sim \text{LKJ}(1) \\ &\Omega_{\text{Local}} \sim \text{LKJ}(10) \\ & \rho \sim \text{Beta}(2,2). \end{align}

My question concerns using the cholesky corr formulation. So instead of \Omega I have it’s cholesky corr (ie I have L from \Omega = \text{LL}'). Now it obviously (or maybe not so obviously) is not equivalent to just weight the corresponding cholesky global and local correlation matrices with \rho. So it means I’m reconstructing the full correlation matrices, doing the weighting and then decomping again,

Omega[i] = cholesky_decompose(rho * multiply_lower_tri_self_transpose(Omega_global) + (1 - rho) * multiply_lower_tri_self_transpose(Omega_local[i]))

Is it possible to update the known cholesky factors and avoid the decomposition?

Some more info on the error when you just weight the cholesky corr matrices:

Chain 1 Exception: hier_model_namespace::log_prob: Omega[sym1__] is not a valid unit vector. The sum of the squares of the elements should be 1, but is 0.999903

and the stan model with cholesky corrs

data {
  int N; // number of observations (number of rows in the panel)
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  matrix[N, K] Y; // the outcome matrix, each individual stacked on each 
  // other, observations ordered earliest to latest
}
transformed data{
 matrix[N - 1, K] Y_lag = Y[2:N];
}
parameters {
  // individual-level parameters
  cholesky_factor_corr[K] L_Omega_local[I]; // cholesky factor of correlation matrix of 
  // residuals each individual (across K outcomes)
  vector<lower = 0>[K] tau[I]; // scale vector for residuals
  matrix[K, K] z_beta[I];
  vector[K] z_alpha[I];
  
  // hierarchical priors (individual parameters distributed around these)
  real<lower = 0, upper = 1> rho;
  cholesky_factor_corr[K] L_Omega_global;
  vector[K] tau_location;
  vector<lower =0>[K] tau_scale;
  matrix[K,K] beta_hat_location;
  matrix<lower = 0>[K,K] beta_hat_scale;
  vector[K] alpha_hat_location;
  vector<lower = 0>[K] alpha_hat_scale;
}
transformed parameters {
  matrix[K, K] beta[I]; // VAR(1) coefficient matrix
  vector[K] alpha[I]; // intercept matrix
  cholesky_factor_corr[K] L_Omega[I];
  
  for(i in 1:I) {
    alpha[i] = alpha_hat_location + alpha_hat_scale .*z_alpha[i];
    beta[i] = beta_hat_location + beta_hat_scale .*z_beta[i];
    L_Omega[i] = cholesky_decompose(rho * multiply_lower_tri_self_transpose(L_Omega_global) + (1 - rho) * multiply_lower_tri_self_transpose(L_Omega_local[i]));
  }  
}
model {
  // hyperpriors
  rho ~ beta(2, 2);
  tau_location ~ cauchy(0, 1);
  tau_scale ~ cauchy(0, 1);
  alpha_hat_location ~ normal(0, 1);
  alpha_hat_scale ~ cauchy(0, 1);
  to_vector(beta_hat_location) ~ normal(0, .5);
  to_vector(beta_hat_scale) ~ cauchy(0, .5);
  L_Omega_global ~ lkj_corr_cholesky(1);

  
  // hierarchical priors
  for(i in 1:I) {
    // non-centered parameterization
    z_alpha[i] ~ normal(0, 1);
    to_vector(z_beta[i]) ~ normal(0, 1);
    tau[i] ~ normal(tau_location, tau_scale);
    L_Omega_local[i] ~ lkj_corr_cholesky(10);
  }
  
  // likelihood
      Y[2:N] ~ multi_normal_cholesky(alpha[individual] + beta[individual] * Y_lag', 
                    diag_pre_multiply(tau[individual], L_Omega[individual]));
}

bgoodri · May 3, 2020, 3:57pm

The weighted sum of Cholesky factors of correlation matrices is not a Cholesky factor of a weighted sum of correlation matrices. But another thing to try would be to make the shape parameter in the LKJ prior for \boldsymbol{\Omega}_\mbox{Local} be something like inv(1 - rho).

spinkney · May 3, 2020, 4:16pm

I love it, simple and avoids all the issues. Will try and report back. Thanks!

spinkney · May 4, 2020, 10:28am

Quick question, if I do as you suggest, I believe that means I can get rid of \Omega_{\text{Global}} and the convex combination and just use \Omega_{\text{Local}}? Or did I misinterpret?

Joran · May 16, 2020, 9:51am

Hi @spinkney!

I’m working on turning my VAR(1) model into a model with random error (co)variances as well, and I’m using the same setup with Omega_Global and Omega_Local. Did you already figure out the new appraoch with inv(1-rho) as the scale parameter in lkj_corr_cholesky?

Because I’m not sure I’m following your argument that you don’t need Omega_Global then. If you just use Omega_Local, then higher rho values still mean more similarity with the approach @bgoodri mentions above, but only because all individual covariance matrices will be closer to the identity matrix then right? If you want them to “group” around something other than the identity matrix you would still need Omega_Global or am I missing something?

Best,
Joran

spinkney · May 17, 2020, 10:44am

I really wanted to avoid the I cholesky decompositions and matrix multiplications to construct L_Omega but I don’t see a way to get around it. I guess the suggestion was to use inv(1-rho) instead of the hard coded hyper parameter for \Omega_{\text{Local}} could be more precise and possibly faster.

Joran · May 17, 2020, 6:04pm

Ahh, ok! But you don’t get an estimate for the overall (group-level) covariance matrix then right?

Skipton_Woolley · December 23, 2020, 5:35am

@spinkney @Joran did either of you manage to implement this version with hard coded hyperparameter for \Omega_{local}?

Joran · November 30, 2021, 3:31pm

Hi @Skipton_Woolley !

Apologies for the very late reply! I haven’t logged in in a long time. I got it working, but building hyper-priors on individual elements becomes difficult if you want to consider more than 3 variables at a time in a VAR. If you have more, you best put LKJ-priors on you global and local covariance-matrices.

Is this still relevant for you. because I’d love to share my code :).

Best,
Joran

Skipton_Woolley · December 9, 2021, 3:06am

Hi @Joran,

Yes, I’m still playing with these model (although on the back burner for while).

I’d be keen to see your code if you are willing to share.

Cheers,

Skip.

Joran · December 11, 2021, 2:32pm

Sure! No problem! 😊.

I’ll clean them up a bit and annotate some more, and will post them here within the next two weeks or so (I have a course all week, next week 😉).

Best!
Joran

Skipton_Woolley · February 14, 2022, 11:19pm

No rush on this, but I thought I’d case it up.

Topic		Replies	Views
Question on cholesky_factor_corr and the resulting correlation matrix Modeling	4	1086	January 8, 2020
Single value chain for cholesky factor of correlation matrix Modeling	2	584	December 18, 2017
Question about Cholesky Decomposition Modeling specification , hierarchical-model	2	684	October 21, 2020
Block covariance matrix Modeling	3	1797	August 28, 2018
Hierarchical prior (for partial pooling) on correlation matrices? General	28	3904	March 8, 2022

Weighted correlation matrices and cholesky_factor_corr

Related topics