Weighted correlation matrices and cholesky_factor_corr

Way back @bgoodri was referenced in @James_Savage post about hierarchical vars at https://khakieconomics.github.io/2016/11/27/Hierarchical-priors-for-panel-vector-autoregressions.html, concerning estimation of a weighted correlation matrix between global and local correlation matrices,

\Omega = \rho \Omega_{\text{Global}} + (1 - \rho) \Omega_{\text{Local}}

with

\begin{align} &\Omega_{\text{Global}} \sim \text{LKJ}(1) \\ &\Omega_{\text{Local}} \sim \text{LKJ}(10) \\ & \rho \sim \text{Beta}(2,2). \end{align}

My question concerns using the cholesky corr formulation. So instead of \Omega I have it’s cholesky corr (ie I have L from \Omega = \text{LL}'). Now it obviously (or maybe not so obviously) is not equivalent to just weight the corresponding cholesky global and local correlation matrices with \rho. So it means I’m reconstructing the full correlation matrices, doing the weighting and then decomping again,

Omega[i] = cholesky_decompose(rho * multiply_lower_tri_self_transpose(Omega_global) + (1 - rho) * multiply_lower_tri_self_transpose(Omega_local[i]))

Is it possible to update the known cholesky factors and avoid the decomposition?

Some more info on the error when you just weight the cholesky corr matrices:

Chain 1 Exception: hier_model_namespace::log_prob: Omega[sym1__] is not a valid unit vector. The sum of the squares of the elements should be 1, but is 0.999903 

and the stan model with cholesky corrs

data {
  int N; // number of observations (number of rows in the panel)
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  matrix[N, K] Y; // the outcome matrix, each individual stacked on each 
  // other, observations ordered earliest to latest
}
transformed data{
 matrix[N - 1, K] Y_lag = Y[2:N];
}
parameters {
  // individual-level parameters
  cholesky_factor_corr[K] L_Omega_local[I]; // cholesky factor of correlation matrix of 
  // residuals each individual (across K outcomes)
  vector<lower = 0>[K] tau[I]; // scale vector for residuals
  matrix[K, K] z_beta[I];
  vector[K] z_alpha[I];
  
  // hierarchical priors (individual parameters distributed around these)
  real<lower = 0, upper = 1> rho;
  cholesky_factor_corr[K] L_Omega_global;
  vector[K] tau_location;
  vector<lower =0>[K] tau_scale;
  matrix[K,K] beta_hat_location;
  matrix<lower = 0>[K,K] beta_hat_scale;
  vector[K] alpha_hat_location;
  vector<lower = 0>[K] alpha_hat_scale;
}
transformed parameters {
  matrix[K, K] beta[I]; // VAR(1) coefficient matrix
  vector[K] alpha[I]; // intercept matrix
  cholesky_factor_corr[K] L_Omega[I];
  
  for(i in 1:I) {
    alpha[i] = alpha_hat_location + alpha_hat_scale .*z_alpha[i];
    beta[i] = beta_hat_location + beta_hat_scale .*z_beta[i];
    L_Omega[i] = cholesky_decompose(rho * multiply_lower_tri_self_transpose(L_Omega_global) + (1 - rho) * multiply_lower_tri_self_transpose(L_Omega_local[i]));
  }  
}
model {
  // hyperpriors
  rho ~ beta(2, 2);
  tau_location ~ cauchy(0, 1);
  tau_scale ~ cauchy(0, 1);
  alpha_hat_location ~ normal(0, 1);
  alpha_hat_scale ~ cauchy(0, 1);
  to_vector(beta_hat_location) ~ normal(0, .5);
  to_vector(beta_hat_scale) ~ cauchy(0, .5);
  L_Omega_global ~ lkj_corr_cholesky(1);

  
  // hierarchical priors
  for(i in 1:I) {
    // non-centered parameterization
    z_alpha[i] ~ normal(0, 1);
    to_vector(z_beta[i]) ~ normal(0, 1);
    tau[i] ~ normal(tau_location, tau_scale);
    L_Omega_local[i] ~ lkj_corr_cholesky(10);
  }
  
  // likelihood
      Y[2:N] ~ multi_normal_cholesky(alpha[individual] + beta[individual] * Y_lag', 
                    diag_pre_multiply(tau[individual], L_Omega[individual]));
}

The weighted sum of Cholesky factors of correlation matrices is not a Cholesky factor of a weighted sum of correlation matrices. But another thing to try would be to make the shape parameter in the LKJ prior for \boldsymbol{\Omega}_\mbox{Local} be something like inv(1 - rho).

1 Like

I love it, simple and avoids all the issues. Will try and report back. Thanks!

Quick question, if I do as you suggest, I believe that means I can get rid of \Omega_{\text{Global}} and the convex combination and just use \Omega_{\text{Local}}? Or did I misinterpret?

Hi @spinkney!

I’m working on turning my VAR(1) model into a model with random error (co)variances as well, and I’m using the same setup with Omega_Global and Omega_Local. Did you already figure out the new appraoch with inv(1-rho) as the scale parameter in lkj_corr_cholesky?

Because I’m not sure I’m following your argument that you don’t need Omega_Global then. If you just use Omega_Local, then higher rho values still mean more similarity with the approach @bgoodri mentions above, but only because all individual covariance matrices will be closer to the identity matrix then right? If you want them to “group” around something other than the identity matrix you would still need Omega_Global or am I missing something?

Best,
Joran

I really wanted to avoid the I cholesky decompositions and matrix multiplications to construct L_Omega but I don’t see a way to get around it. I guess the suggestion was to use inv(1-rho) instead of the hard coded hyper parameter for \Omega_{\text{Local}} could be more precise and possibly faster.

Ahh, ok! But you don’t get an estimate for the overall (group-level) covariance matrix then right?

@spinkney @Joran did either of you manage to implement this version with hard coded hyperparameter for \Omega_{local}?

Hi @Skipton_Woolley !

Apologies for the very late reply! I haven’t logged in in a long time. I got it working, but building hyper-priors on individual elements becomes difficult if you want to consider more than 3 variables at a time in a VAR. If you have more, you best put LKJ-priors on you global and local covariance-matrices.

Is this still relevant for you. because I’d love to share my code :).

Best,
Joran

Hi @Joran,

Yes, I’m still playing with these model (although on the back burner for while).

I’d be keen to see your code if you are willing to share.

Cheers,

Skip.

Sure! No problem! 😊.

I’ll clean them up a bit and annotate some more, and will post them here within the next two weeks or so (I have a course all week, next week 😉).

Best!
Joran

No rush on this, but I thought I’d case it up.