Sparse sampling for a Cholesky correlation matrix (or modelling correlations among some, but not all parameters)

philophthalmus · January 23, 2019, 5:41pm

I’m using Stan for a hierarchical model and I have a quick question about initializing and sampling of a Cholesky correlation matrix when I’m interested in subject-level correlations of some, but not all, parameters.

For example, let’s say I have a model with three fixed-effects, but it does not make sense, given the knowledge of the system, to model the correlation between the second and third parameter.

In this simple example, I’ve learned that the following pseudo-code does not work because the return type of lkj_corr_cholesky(real) is matrix:

parameters {
  .... 
  .... 
  cholesky_factor_corr[3] C_f_c;
  .... 
} 

model {
  .... 
  .... 
  C_f_c[2,1] ~ lkj_corr_cholesky(2);
  C_f_c[3,1] ~ lkj_corr_cholesky(2);
  // but crucially not:  C_f_c[2,3] ~ lkj_corr_cholesky(2);
  .... 
}

It would be very helpful if I could get some suggestions for sampling sparsely for a Cholesky correlation matrix.

jjramsey · January 23, 2019, 8:28pm

The output from lkj_corr_cholesky is not a correlation matrix, but only the Cholesky factor of a correlation matrix. That means that element (2,1) of the Cholesky factor is in general not directly indicative of any correlation between your 2nd and 1st variables.

If you know the structure of your correlation matrix already, then you may wish to avoid setting a prior for the matrix as a whole and just set a prior for each of its unknown elements. Maybe do something like the following?

parameters {
   real<lower=-1.0, upper=1.0> rho12;
   real<lower=-1.0, upper=1.0> rho13;
}

model {
   matrix[3,3] my_corr_mat;
   matrix[3,3] my_corr_mat_chol;

   for (i in 1:3) {
      my_corr_mat[i,i] = 1.0;
   }

   my_corr_mat[1,2] = rho12;
   my_corr_mat[1,3] = rho13;
   my_corr_mat[2,3] = 0.0;

   my_corr_mat[2,1] = my_corr_mat[1,2];
   my_corr_mat[3,1] = my_corr_mat[1,3];
   my_corr_mat[3,2] = my_corr_mat[2,3];

   my_corr_mat_chol = cholesky_decompose(my_corr_mat);
}

philophthalmus · January 23, 2019, 8:39pm

Thank you so much for your suggestion, jjramsey.

I like your suggestion a lot. Do you think, it is possible to do:

cholesky_factor_corr[3] my_corr_mat_chol;

instead of:

matrix[3,3] my_corr_mat_chol;

This might be such a trivial point, but I was wondering if cholesky_factor_corr[n] might be lighter than matrix[n,n].

Thank you,

jjramsey · January 23, 2019, 8:55pm

If you want to use cholesky_factor_corr as the type for my_corr_mat_chol, then you’ll have to declare and initialize my_corr_mat_chol in the transformed parameters block instead of the model block. That will also cause RStan, PyStan, etc., to save samples of my_corr_mat_chol in their respective fit objects, which may not be what you want.

I don’t believe this is the case.

philophthalmus · January 23, 2019, 9:12pm

Thank you for your input.

philophthalmus · January 24, 2019, 3:49pm

Just a quick follow-up on this topic, I have been finding that initialising a sparse matrix myself leads to an initialization error:

Rejecting initial value:
Error evaluating the log probability at the initial value.
Exception: cholesky_decompose: Matrix m is not positive definite

I’m guessing this is related to the numerical problem with cholesky_decompose that has been pointed previously https://discourse.mc-stan.org/t/cholesky-factor-corr-initializing-with-invalid-values/3167/6. But I wonder if there is any way around this.

jjramsey · January 24, 2019, 5:39pm

That’s a possibility, but you should check to make sure that your correlation matrix actually is positive definite, especially since there’s a significant potential for error if you’re constructing that matrix manually. You might want to use a symbolic computation tool (like SymPy or Maxima) to find the inverse or eigenvalues of your correlation matrix to see if it’s reasonable.

Charles_Driver · January 24, 2019, 6:25pm

I deal with some of these issues on page 10 of: https://www.researchgate.net/publication/324093594_Hierarchical_Bayesian_Continuous_Time_Dynamic_Modeling
maybe it’s helpful :)

Bobby · January 24, 2019, 7:23pm

If we set priors for each of the unknown elements instead of the whole correlation matrix, is it possible that we will encounter the nonpositive definite problem during the sampling? How can we avoid this?

Charles_Driver · January 25, 2019, 12:05pm

The usual approach would be to set the priors on individual elements that get combined together in a way that ensures positive definiteness, like in the approach I linked to.

philophthalmus · January 25, 2019, 4:49pm

Thank you, jjramsey, Charles_Driver and Bobby for a thought-provoking discussion.

I’m afraid I was confused earlier and the original mock example I presented was unnecessarily complicated for what I needed. So please let me rephrase my question by saying that I’m interested in correlations of parameters among some but not all fixed effect parameters in a hierarchical model. I’m realizing now that this does not mean I need a sparse matrix. Rather, I think I just need a correlation matrix whose dimension is smaller than the number of fixed effects parameters.

Please imagine a different mock example where I have 4 fixed effects: theta1, theta2, theta3 and theta4. However, I’m only interested in modelling the correlations among the first 3 parameters given the knowledge of the system. In this example case, I am thinking of writing the following (pseudo-) stan programme. I would be very grateful if you could comment on whether it is okay to implement a smaller matrix for Cholesky decomposition and model the subject-level effect of theta4 independently as I have coded below.

parameters {
  .... 
  vector<lower=0>[4] sigma_u; //subject-level sd
  cholesky_factor_corr[3] my_corr_mat_chol; 
  matrix[3,J] z_u; // where J is the number of subjects

  vector[J] u_4; //subject-level deviation for the forth parameter
  .... 
} 
transformed parameters {
  .... 
  matrix[3,J] = u_123; // subject-level deviations for the first three parameters
  vector[4] theta[J];
  u_123 = diag_pre_multiply(sigma_u, my_corr_mat_chol) * z_u;

    for(j in 1:J) {
    	real theta[j][1] = theta1 + u_123[1,j];
    	real theta[j][2] = theta2 + u_123[2,j];
    	real theta[j][3] = theta3 + u_123[3,j];
    	real theta[j][4] = theta4 + u_4[j]);
    }
  .... 
} 

model {
  .... 
   my_corr_mat_chol ~ lkj_corr_cholesky(2.0);
   u_4 ~ normal(0, sigma_u[4]);
   to_vector(z_u) ~ normal(0,1);

  target += map_rect(my_objective_function, phi, theta, x_r, x_i)
}

Thank you.

bgoodri · January 28, 2019, 9:47pm

Valid syntax, possibly dubious restriction that the fourth is uncorrelated with the previous three.

philophthalmus · January 28, 2019, 9:53pm

Thank you, bgoodri.

Jozsef_Hegedus · January 31, 2019, 9:08pm

Hi @jjramsey,

I am reading this thread - out of curiosity. Could you please suggest a book in which this Cholesky story is nicely explained (to someone who has read the Bishop book)?

I would like learn enough basic background so that I can understand the main points of this thread. (I have already OK background, but this Cholesky story is a bit new to me and I rather read a good book than trying to reverse engineer Wikipedia). Many thanks in advance.

Cheers,

Jozsef

jjramsey · February 1, 2019, 1:35pm

Not sure what you mean by “Cholesky story”. For my purposes, the Cholesky decomposition is pretty much a “black box” algorithm that takes in a matrix \mathbf{M} and finds a lower-triangular matrix \mathbf{L} that satisfies \mathbf{M} = \mathbf{L}\mathbf{L}^T. The reason that’s of interest to me is a multivariate normal vector \mathbf{x} can be expressed as \boldsymbol{\mu} + \mathbf{L}\boldsymbol{\xi}, where the elements of \boldsymbol{\xi} are independently sampled from a normal distribution with zero mean and a variance of 1, and the covariance of \mathbf{x} is \mathbf{L}\mathbf{L}^T. This is mentioned in the Stan 2.18 User Guide, in the subsection “Multivariate Reparameterizations.”

Jozsef_Hegedus · February 1, 2019, 2:38pm

Wow, thanks @jjramsey I think you managed to
make me get a good intuition about what this “cholesky business” is about . I still need to write this down and/or think over how this relates to the log likelihood but I don’t think that will be difficult. I think you have just explained to me the most important part of this “story”, thanks !

Jozsef

Topic		Replies	Views
Question on cholesky_factor_corr and the resulting correlation matrix Modeling	4	1099	January 8, 2020
Block covariance matrix Modeling	3	1800	August 28, 2018
How to set up the priors for the random effects? Modeling rstan , techniques , fitting-issues , performance	8	1578	February 11, 2021
Learning a correlation matrix Modeling	8	3281	December 13, 2017
Issues with cholesky_factor_corr Matrices: Estimated as 1 During Model Fitting, Exceeding [-1,1] in Predictive Sampling Modeling rstan	2	46	February 11, 2025

Sparse sampling for a Cholesky correlation matrix (or modelling correlations among some, but not all parameters)

Related topics