Single value chain for cholesky factor of correlation matrix

Necessary disclosure: I’m still a relative newb to Stan.

I fit a hierarchical logistic regression model with co-varying slopes / intercepts and group level regressions for each base level coefficient. The model is taken from the example in section 9.13 of the user manual and reconfigured to predict binomial outcomes.

Here is the model:

data {
    int<lower=0> N; // num individuals
    int<lower=1> K; // num ind predictors
    int<lower=1> J; // num groups
    int<lower=1> L; // num group predictors
    int<lower=1,upper=J> jj[N];  // group for individual
    matrix[N, K] x; // individual predictors
    matrix[J, L] u; // group predictors
    int<lower=0> y[N]; // successes
    int<lower=0> trials[N]; //trials
}
parameters {
    matrix[K, J] z;
    cholesky_factor_corr[K] L_Omega; // potential problematic parameter
    vector<lower=0,upper=pi()/2>[K] tau_unif;
    matrix[L, K] gamma; // group coeffs
}
transformed parameters {   
    matrix[J, K] beta;
    vector[N] p;
    vector<lower=0>[K] tau; // prior scale
    for (k in 1:K) tau[k] = 2.5 * tan(tau_unif[k]);
    beta = u * gamma + (diag_pre_multiply(tau, L_Omega) * z)';  
    p = rows_dot_product(x, beta[jj]);        
} 
model {
    to_vector(z) ~ normal(0, 1);
    L_Omega ~ lkj_corr_cholesky(3);
    to_vector(gamma[1,]) ~ normal(0, 10); // setting vague priors on intercepts for each group lvl regression
    to_vector(gamma[2:,]) ~ normal(0, 1); // setting less vague priors on remaining group lvl coeffs
    y ~ binomial_logit(trials, p);    
}
generated quantities {
    int<lower=0> y_tilde[N];
    for (n in 1:N)
        y_tilde[n] = binomial_rng(trials[n], inv_logit(p[n]));
}

There are 161 groups with 20 binary predictors for the group level regressions and 27 coefficients (mostly binary, but standardized where not), with ~5k records at the base level.

The model takes about 20 hours to run with the tree depth set to 15 for 800 iterations with 5 chains. When I check the diagnostics, everything looks fine: rhats all <=1.02, few / no divergences, lots of effective samples, etc. However, I get rhat = nan’s for about 1/2 of the L_Omega parameters, which get stuck at 0 for every sample.

The strange thing is that when I do posterior predictive checks, the model seems reasonable!

Does this behavior imply the inference is biased / invalid? What’s the best way to understand what’s going on?

The upper triangular elements of a Cholesky factor are not random variables, since they are fixed to be zero.

4 Likes

Aha! That makes total sense. Thanks for taking the time to respond!