Non-Centered Cholesky Factor Correlation Matrix Draws Imply Non-Unit Self-Correlation

bgb-cu · July 11, 2022, 3:20am

Hello all,

I have a series of survey responses that I wish to fit into a latent variable model of the following form (using python 3.10.5/PyStan v. 3.5.0):

all I survey respondents were asked one common item (with responses on a Q class Likert scale). I aim to model this component of my data using an ordinal logit model, with the latent class z and logit cutpoints c_shared. My features are labeled as alpha_* (individual-level) and gamma (group-level), and are fit using non-centered parametrization.
Respondents were additionally asked up to N additional items (with responses on an R class Likert scale), with each items belonging to one of M “topics”, constituting J item-response pairs. Each of the I respondents was asked at least one survey item from each of the M topics. I aim to model this component of my data using an ordinal IRT model, with responses to item N = n modeled using latent variable t_m corresponding to topic m[n] and item-specific cut-points c_n. Features in this model are represented using beta_*, and fit using non-centered parameterization.
Lastly, I wish to model individual covariance across latent vector z and the M latent vectors t_m. I aim to use non-centered parametrization for my covariance matrix, with Cholesky factor matrix L_Omega and scale vector tau.

My model is currently specified as follows:

data {

  // Metadata
  int<lower=0> I; // Number of Interviews
  int<lower=0> J; // Number of Topic-Question Item-Response Pairs
  int<lower=0> M; // Number of Item Classes/Latent Issue Dimensions
  int<lower=0> N; // Number of Survey Items
  int<lower=0> Q; // Number of Likert Response Punches for Core Question
  int<lower=0> R; // Number of Likert Response Punches for Topic-Question Items
  
  // Variable Level Declarations
  int<lower=0> N_a;
  int<lower=0> N_b;
  int<lower=0> N_c;
  int<lower=1> N_d;
  
  // Individual-Level Predictors
  array[I] int<lower=1,upper=N_a> a;
  array[I] int<lower=1,upper=N_b> b;
  array[I] int<lower=1,upper=N_c> c;
  array[I] int<lower=1,upper=N_d> d;

  // Group-Level Predictors
  vector[N_d] u;

  array[I] int<lower=1,upper=Q> y_shared; // Core Item Responses
  
  // Topic-Level Items
  
  /// Item-Level Lookup Metadata
  array[N] int<lower=1,upper=M> m_n;   // Additional Item Topic Lookup
  array[J] int<lower=1,upper=I> i_j;   // Response-Level Respondent ID
  array[J] int<lower=1,upper=N> n_j;   // Response-Level Item ID
  array[J] int<lower=1,upper=R> y_n_j; // Additional Item Responses
  
  
}
parameters {

  // Target Cutpoints
  ordered[Q - 1] c_shared;     // Core Item
  array[N] ordered[R - 1] c_n; // Additional Items
  
  // Core Item
  
  // Hyperpriors
  ///Prior Uncertainties for Varying Intercepts
  real<lower=0> sigma_alpha_a;
  real<lower=0> sigma_alpha_b;
  real<lower=0> sigma_alpha_c;
  real<lower=0> sigma_alpha_d;
  
  /// Prior Uncertainty for Group-Level Slope
  real<lower=0> sigma_gamma;
  
  // Modeled Effects
  /// Varying Intercepts
  vector[N_a] alpha_a_raw;
  vector[N_b] alpha_b_raw;
  vector[N_c] alpha_c_raw;
  vector[N_d] alpha_d_raw;  
  /// Group Level Slope
  real gamma_raw;
  
  
  // Topic-Level Additional Items
  
  // Hyperprior Uncertainties for Varying Intercepts
  vector<lower=0>[M] sigma_beta_a;
  vector<lower=0>[M] sigma_beta_b;
  vector<lower=0>[M] sigma_beta_c;
  vector<lower=0>[M] sigma_beta_d;
  
  // Varying Intercepts
  array[M] vector[N_a] beta_a_raw;
  array[M] vector[N_b] beta_b_raw;
  array[M] vector[N_c] beta_c_raw;
  array[M] vector[N_d] beta_d_raw;
  
  // Correlated Error Parameters
  /// Hyperpriors
  vector<lower=0>[3] tau;          // prior scale
  cholesky_factor_corr[3] L_Omega; // factored prior covariance

  /// Raw Variable Vector of tau * L_Omega product for Non-Centered Parametrization
  vector[3] tau_L_Omega_raw;
  
  
}
transformed parameters{

  // Core Item
  
  // Varying Intercepts
  vector[N_a] alpha_a;
  vector[N_b] alpha_b;
  vector[N_c] alpha_c;
  vector[N_d] alpha_d;
  
  // Group Level Slopes
  real gamma;
  
  // Non-Centered Parametrization for Intercepts and Slopes
  alpha_d = alpha_d_raw * sigma_alpha_d;
  alpha_a = alpha_a_raw * sigma_alpha_a;
  alpha_b = alpha_b_raw * sigma_alpha_b;
  alpha_c = alpha_c_raw * sigma_alpha_c;  
  
  gamma = gamma_raw * sigma_gamma;
  
  // Additional Items
  
  // Varying Intercepts
  array[M] vector[N_a] beta_a;
  array[M] vector[N_b] beta_b;
  array[M] vector[N_c] beta_c;
  array[M] vector[N_d] beta_d;
  
  // Non-Centered Parametrization for Varying Intercepts
  
  /// Loop Over Latent Topics to vectorize Multilevel Intercepts
  for (m in 1:M){
    beta_a[m] = beta_a_raw[m] * sigma_beta_a[m];
    beta_b[m] = beta_b_raw[m] * sigma_beta_b[m];
    beta_c[m] = beta_c_raw[m] * sigma_beta_c[m];  
    beta_d[m] = beta_d_raw[m] * sigma_beta_d[m];
  }
  
  // Cholesky Factored Covariance Term
  vector[3] tau_L_Omega;
  
  tau_L_Omega = diag_pre_multiply(tau, L_Omega) * tau_L_Omega_raw;
  
}
model {
  
  // Core Survey Item
  
  /// Prior Uncertainties
  sigma_alpha_a ~ std_normal();
  sigma_alpha_b ~ std_normal();
  sigma_alpha_c ~ std_normal();
  sigma_alpha_d ~ std_normal();
  sigma_gamma_support ~ std_normal();
  
  /// Varying Intercepts
  alpha_a_raw ~ std_normal();
  alpha_b_raw ~ std_normal();
  alpha_c_raw ~ std_normal();
  alpha_d_raw ~ std_normal();
  
  /// Group Level Effects
  gamma_raw ~ std_normal();
  
  // Additional Items
  
  /// Prior Uncertainties
  sigma_beta_a ~ std_normal();
  sigma_beta_b ~ std_normal();
  sigma_beta_c ~ std_normal();
  sigma_beta_d ~ std_normal();
  
  /// Varying Intercepts

  //// Multilevel Intercepts
  for (m in 1:M){
      beta_a_raw[m] ~ std_normal();
      beta_b_raw[m] ~ std_normal();
      beta_c_raw[m] ~ std_normal();
      beta_d_raw[m] ~ std_normal();
   }


  // Latent Variable Correlation
  tau ~ cauchy(0,2);
  L_Omega ~ lkj_corr_cholesky(2);
  tau_L_Omega_raw ~ std_normal();

  // Declare Local Scope for Latent Variable Vectorization
  {
    // Vectorized Group-Level Predictions for Individuals
    vector[N_d] u_gamma;
    u_gamma = u * gamma;
    
    // Declare Core Question Latent Variable z, Latent Topic Variables t_m
    vector[I] z;
    array[M] vector[I] t_m_i;
    vector[J] t_j;
    
    // Vectorize Latent Variables for Core Item, Topics
    for (i in 1:I){
    
      z[i] = alpha_a[a[i]] + alpha_b[b[i]] + alpha_c[c[i]] + alpha_d[d[i]] + u_gamma[d[i]] + tau_L_Omega[1];
      
      for (m in 1:M){
        t_m_i[m,i] = beta_a[m,a[i]] + beta_b[m,b[i]] + beta_c[m,c[i]] + beta_state[m,d[i]] + tau_L_Omega[m + 1];  
      }
      
    }
    
    // Reindex Latent Topic Variables from individuals to item-reponse pairs
    for (j in 1:J){
      t_j[j] = t_m_i[m_n[n_j[j]],i_j[j]];
    }
    
    // Fit responses to Core Item
    y_shared ~ ordered_logistic(z, c_shared);
    
    // Fit responses to additional items
    y_n_j ~ ordered_logistic(t_j, c_n[n_j]);
    
    
  }

}

As a placeholder, I’ve currently hard-coded parameters associated with the individual-level covariance matrix in a manner that implies a 3 x 3 matrix (i.e M = 2): I want to confirm that I’m able to make things work in the minimal case (two latent topics t_m, plus core latent variable z) before scaling any further.

Model output looks fine without the correlated individual covariance component (i.e. w/o the various tau/L_Omega terms). When I add the covariance terms, the output of my regression coefficients (alpha_*, beta_*, and gamma_* and my cut-points (c_shared and c_n) remain reasonable. Moreover, all the tau and L_Omega terms appear to have converged (with stable trace-plots for these parameters, as well as R_hat values around 1.0).

However, when I try to reverse-engineer my covariance matrix by multiplying L_Omega by its transpose, I recover a matrix that does not have unit entries on its diagonal:

[1.        , 0.        , 0.        ],
[0.        , 0.8443724 , 0.        ],
[0.        , 0.        , 0.65253416]]

The above values are generated by calculating for each of my 4,000 non-discarded training samples and then averaged across the samples, as I first became suspicious when I used the averaged values for the entries L_Omega and saw this counterintuitive output.

Notwithstanding the fact that the data is displaying no individual-level correlations across my three latent dimensions (if this is what the data has to tell me, so be it), the fact that the diagonal entries of my correlation matrix are not all equal to 1 suggests to me that something has gone wrong in my specification. I’m not sure whether I may have implemented the non-centered parametrization of the Cholesky-factored correlation matrix incorrectly here, or else whether something different might be throwing things off.

Appreciate any guidance on steps I might be able to take to rectify things here.

andrjohns · July 11, 2022, 8:01am

To double-check, when you’re performing the multiplication you are using:

LL^T

And not:

L^TL

Additionally, the simplest way to check would be to construct the multiplication in the generated quantities:

matrix[3,3] Omega = multiply_lower_tri_self_transpose(L_Omega);

bgb-cu · July 11, 2022, 1:11pm

@andrjohns thanks for the guidance here! The diagonal entries on the generated quantity Omega are all equal to 1, so looks like my manual math was off somewhere somewhere.

For my own edification, any guidance on where the below python code was going awry?

n_draws = len(fit.to_frame())

cov_cols = [f'tau.{i}' for i in range(1,4)] + [f'L_Omega.{i}.{j}' for i in range(1,4) for j in range(1,4)]
cov_df = fit.to_frame().loc[:,cov_cols].T

cholesky_fact_corr_matrix = np.zeros((n_draws, 3, 3))
Omega_raw = np.zeros((n_draws, 3, 3))
Omega = np.zeros((3, 3))

diag_tau = np.zeros((n_sims, 3, 3))
Sigma_raw = np.zeros((n_sims, 3, 3))
Sigma = np.zeros((3, 3))

for i in range(n_draws):
    for j in range(3):
        for k in range(3):
    
            cholesky_fact_corr_matrix[i,j,k] = cov_df.loc[f'L_Omega.{j+1}.{k+1}',i]
        
   
    Omega_raw[i] = cholesky_fact_corr_matrix[i] * (cholesky_fact_corr_matrix[i].T)

    for j in range(3):
    
        diag_tau[i,j,j] = cov_df.loc[f'tau.{j + 1}',i]
        
    Sigma_raw[i] = diag_tau[i] * Omega_raw[i] * diag_tau[i].T

for i in range(3):
    for j in range(3):
        
        Omega[i,j] = Omega_raw[:,i,j].mean()
        Sigma[i,j] = Sigma_raw[:,i,j].mean()
        
    
print(Omega)
print(Sigma)

Topic		Replies	Views
Sigma / correlation matrix and non-centered parameterization of random effects WITHOUT Cholesky factorization Modeling	4	702	January 31, 2022
Non-Centered, Cholesky Factored Covariance Matrix Converges to Inconsistent Parameter Values; Issue with Model Specification? Modeling techniques , specification , covariance , ordinal-response	11	672	August 16, 2022
Adding covariance matrix to non-centered learning model Modeling cognitive-science	7	936	November 12, 2018
Convert correlation matrix to non-centered parameterization? Modeling loo	8	1398	March 10, 2018
Sparse sampling for a Cholesky correlation matrix (or modelling correlations among some, but not all parameters) Modeling	15	1278	February 1, 2019

Non-Centered Cholesky Factor Correlation Matrix Draws Imply Non-Unit Self-Correlation

Related topics