Chain Finishing Unexpectedly in High Dimension Hierarchical Model

I am attempting to build a multi-level model that shares information across groups. I have 343 predictors in the model: 17 predictors (which I would like to constrain to be positive and with a weibull prior), 17 time lagged versions of the predictors, 17 geo spatially lagged versions of the predictors, weather controls, time fixed effects, and geographic fixed effects. I’m not incredibly surprised but my chains are not finishing successfully. I am getting the following error, but when I reduce the predictor space to being closer to 20 the errors show up during warm up and then never again. I’m sure there are a number of issues here (I am very out of my depth), but is there something glaring that I’m missing. Thank you in advance!

Errors:

lkj_corr_lpdf: Correlation matrix is not positive definite.

Chain 1 Rejecting initial value:
Chain 1   Error evaluating the log probability at the initial value.

This is my overall code:

data {
    int N_obs; // number of observations
    int N_pts; // number of participants
    int K; // number of predictors + intercept
    int nX_direct; // number of constrained predictors
    array[N_obs] int pid; // id vector for groups
    matrix[N_obs, K] x; // matrix of predictors
    array[N_obs] real y; // y vector
}
parameters {
    array[N_pts] vector[K] beta_p; // ind participant intercept and slope coefficients by group
    vector<lower=0>[K] sigma_p; // sd for intercept and slope
    vector[K] beta; // intercept and slope hyper-priors
    corr_matrix[K] Omega; // correlation matrix
    real<lower=0> sigma; // population sigma
}
model {
    vector[N_obs] mu;
    
    // priors
    beta[1:nX_direct] ~ weibull(1, .1);
    beta[nX_direct+1:K] ~ normal(0, .25);
    Omega ~ lkj_corr(K); // additionally should this be K or something smaller?
    sigma_p ~ exponential(1);
    sigma ~ exponential(1);
    beta_p ~ multi_normal(beta, quad_form_diag(Omega, sigma_p));
    
    // likelihood
    for (i in 1 : N_obs) {
        mu[i] = x[i] * beta_p[pid[i]]; // * is matrix multiplication in this context
    }
    
    y ~ normal(mu, sigma);
}

Initializing large correlation matrices is numerically tricky, and Stan’s default initialization of these matrices is numerically fragile. A method exists to initialize them more robustly, and is commonly called the onion method. Here’s a good place to begin reading: Using the onion method in occupancy model hierarchical model - #2 by spinkney

1 Like

If you sample with the weibull distribution you have to constraint these by lower=0.
That means you have to split the parameter beta in beta_unconstrained to sample
from the normal distribution and beta_contrained to sample from the weibull. After you
may define a transformed parameter beta and assign them.

However before you do this, you have to get the indexing right.

    array[N_pts] vector[K] beta_p; // ind participant intercept and slope coefficients by group
    beta[1:nX_direct] ~ weibull(1, .1);
    beta[nX_direct+1:K] ~ normal(0, .25);

beta[1:nX_direct] and beta[nX_direct+1:K] addresses the first dimension of array[N_pts].

To address the sampling problems of the parameter Omega, initialize Omega with the Identity-Matrix of dimension K.

2 Likes

Thank you both so much for replying! In addressing @andre.pfeuffer comments first I broke out my beta vector into beta_constrained and beta_unconstrained. I realized I couldn’t set the lower bounds on a subset of the vector like such:

vector[1:nX_direct] beta_con; // intercept and slope hyper-priors constrained
vector[nX_direct:K] beta_uncon; // intercept and slope hyper-priors constrained

so I separated by predictor matrix from x into x_con and x_uncon, which had a waterfall effect in my code until I ended up with the following:

data {
    // rows
    int N_obs; // number of observations
    int N_pts; // number of participants
    
    // dimensions
    // int K; // number of predictors + intercept
    int nX_direct; // number of predictors + intercept constrained
    int nX_unconstrained; // number of predictors + intercept unconstrained
    
    // matrices
    array[N_obs] int pid; // id vector
    // matrix[N_obs, K] x; // matrix of predictors
    matrix[N_obs, nX_direct] x_con; // matrix of predictors constrained
    matrix[N_obs, nX_unconstrained] x_uncon; // matrix of predictors unconstrained
    array[N_obs] real y; // y vector
}
parameters {
    // coefficients by groups
    // array[N_pts] vector[K] beta_p; // ind participant intercept and slope coefficients by group
    array[N_pts] vector[nX_direct] beta_p_con; // ind participant intercept and slope coefficients by group constrained
    array[N_pts] vector[nX_unconstrained] beta_p_uncon; // ind participant intercept and slope coefficients by group unconstrained
    
    // sigmas
    // vector<lower=0>[K] sigma_p; // sd for intercept and slope
    vector<lower=0>[nX_direct] sigma_p_con; // sd for intercept and slope constrained
    vector<lower=0>[nX_unconstrained] sigma_p_uncon; // sd for intercept and slope unconstrained
    
    // betas
    // vector<lower=0>[K] beta; // intercept and slope hyper-priors
    vector<lower=0>[nX_direct] beta_constrained; // intercept and slope hyper-priors constrained
    vector[nX_unconstrained] beta_unconstrained; // unconstrained
    
    // cholesky matrices
    // cholesky_factor_corr[K] Omega; // correlation matrix
    cholesky_factor_corr[nX_direct] L_con; // correlation matrix constrained
    cholesky_factor_corr[nX_unconstrained] L_uncon; // correlation matrix unconstrained
    
    // pop sigma
    real<lower=0> sigma; // population sigma
}

model {
    vector[N_obs] mu;
    
    // priors
    
    // beta ~ normal(0, 1);
    beta_unconstrained ~ normal(0, 1);
    beta_constrained ~ weibull(1, .1);
    
    // Omega ~ lkj_corr(K);
    L_con ~ lkj_corr_cholesky(nX_direct);
    L_uncon ~ lkj_corr_cholesky(nX_unconstrained) ;
    
    // sigma_p ~ exponential(1);
    sigma_p_con ~ exponential(1);
    sigma_p_uncon ~ exponential(1);
    sigma ~ exponential(1);
    
    // beta_p ~ multi_normal(beta, quad_form_diag(Omega, sigma_p));
    beta_p_con ~ multi_normal_cholesky(beta_constrained, quad_form_diag(L_con, sigma_p_con)); // new corr
    beta_p_uncon ~ multi_normal_cholesky(beta_unconstrained, quad_form_diag(L_uncon, sigma_p_uncon)); // new corr unconstrained
    
    // likelihood
    for (i in 1 : N_obs) {
        // mu[i] = x[i] * beta_p[pid[i]]; // * is matrix multiplication in this context
        mu[i] = x_con[i] * beta_p_con[pid[i]] + x_uncon[i] * beta_p_uncon[pid[i]]; // * is matrix multiplication in this context
    }
    
    y ~ normal(mu, sigma);
}

I also changed my correlation matrix to a cholesky correlarion matrix and set priors for my mlm betas (beta_p_con and beta_p_uncon) with multi_normal_cholesky(). Changing from multi_normal() to multi_normal_cholesky() fixed my initialization issues. Additionally, I updated mu to be the addition of the two separate matrix multiplications. Do we see glaring issues with that?

1 Like

You could simply define:

vector<lower=0>[1:nX_direct] beta_con; // intercept and slope hyper-priors constrained
vector[nX_direct:K] beta_uncon; // intercept and slope hyper-priors constrained

adding the following after the parameter block

transformed parameters {
   vector[K] beta = append_row(beta_con, beta_uncon);
}

Although you use a array[] vector structure too.
This will, if you go this road, require assignments in loop for each 1,.., N_pts.

beta_p_con ~ multi_normal_cholesky(beta_constrained, quad_form_diag(L_con, sigma_p_con)); // new corr

And this will require a truncated multivariate-normal distribution. See here:

and here:
https://github.com/spinkney/helpful_stan_functions/blob/main/functions/distribution/multi_normal_cholesky_truncated.stanfunctions

1 Like