I am attempting to build a multi-level model that shares information across groups. I have 343 predictors in the model: 17 predictors (which I would like to constrain to be positive and with a weibull prior), 17 time lagged versions of the predictors, 17 geo spatially lagged versions of the predictors, weather controls, time fixed effects, and geographic fixed effects. I’m not incredibly surprised but my chains are not finishing successfully. I am getting the following error, but when I reduce the predictor space to being closer to 20 the errors show up during warm up and then never again. I’m sure there are a number of issues here (I am very out of my depth), but is there something glaring that I’m missing. Thank you in advance!
Errors:
lkj_corr_lpdf: Correlation matrix is not positive definite.
Chain 1 Rejecting initial value:
Chain 1 Error evaluating the log probability at the initial value.
This is my overall code:
data {
int N_obs; // number of observations
int N_pts; // number of participants
int K; // number of predictors + intercept
int nX_direct; // number of constrained predictors
array[N_obs] int pid; // id vector for groups
matrix[N_obs, K] x; // matrix of predictors
array[N_obs] real y; // y vector
}
parameters {
array[N_pts] vector[K] beta_p; // ind participant intercept and slope coefficients by group
vector<lower=0>[K] sigma_p; // sd for intercept and slope
vector[K] beta; // intercept and slope hyper-priors
corr_matrix[K] Omega; // correlation matrix
real<lower=0> sigma; // population sigma
}
model {
vector[N_obs] mu;
// priors
beta[1:nX_direct] ~ weibull(1, .1);
beta[nX_direct+1:K] ~ normal(0, .25);
Omega ~ lkj_corr(K); // additionally should this be K or something smaller?
sigma_p ~ exponential(1);
sigma ~ exponential(1);
beta_p ~ multi_normal(beta, quad_form_diag(Omega, sigma_p));
// likelihood
for (i in 1 : N_obs) {
mu[i] = x[i] * beta_p[pid[i]]; // * is matrix multiplication in this context
}
y ~ normal(mu, sigma);
}
Initializing large correlation matrices is numerically tricky, and Stan’s default initialization of these matrices is numerically fragile. A method exists to initialize them more robustly, and is commonly called the onion method. Here’s a good place to begin reading: Using the onion method in occupancy model hierarchical model - #2 by spinkney
1 Like
If you sample with the weibull distribution you have to constraint these by lower=0
.
That means you have to split the parameter beta
in beta_unconstrained
to sample
from the normal distribution and beta_contrained to sample from the weibull. After you
may define a transformed parameter beta
and assign them.
However before you do this, you have to get the indexing right.
array[N_pts] vector[K] beta_p; // ind participant intercept and slope coefficients by group
beta[1:nX_direct] ~ weibull(1, .1);
beta[nX_direct+1:K] ~ normal(0, .25);
beta[1:nX_direct]
and beta[nX_direct+1:K]
addresses the first dimension of array[N_pts].
To address the sampling problems of the parameter Omega
, initialize Omega
with the Identity-Matrix of dimension K
.
2 Likes
Thank you both so much for replying! In addressing @andre.pfeuffer comments first I broke out my beta vector into beta_constrained
and beta_unconstrained
. I realized I couldn’t set the lower bounds on a subset of the vector like such:
vector[1:nX_direct] beta_con; // intercept and slope hyper-priors constrained
vector[nX_direct:K] beta_uncon; // intercept and slope hyper-priors constrained
so I separated by predictor matrix from x
into x_con
and x_uncon
, which had a waterfall effect in my code until I ended up with the following:
data {
// rows
int N_obs; // number of observations
int N_pts; // number of participants
// dimensions
// int K; // number of predictors + intercept
int nX_direct; // number of predictors + intercept constrained
int nX_unconstrained; // number of predictors + intercept unconstrained
// matrices
array[N_obs] int pid; // id vector
// matrix[N_obs, K] x; // matrix of predictors
matrix[N_obs, nX_direct] x_con; // matrix of predictors constrained
matrix[N_obs, nX_unconstrained] x_uncon; // matrix of predictors unconstrained
array[N_obs] real y; // y vector
}
parameters {
// coefficients by groups
// array[N_pts] vector[K] beta_p; // ind participant intercept and slope coefficients by group
array[N_pts] vector[nX_direct] beta_p_con; // ind participant intercept and slope coefficients by group constrained
array[N_pts] vector[nX_unconstrained] beta_p_uncon; // ind participant intercept and slope coefficients by group unconstrained
// sigmas
// vector<lower=0>[K] sigma_p; // sd for intercept and slope
vector<lower=0>[nX_direct] sigma_p_con; // sd for intercept and slope constrained
vector<lower=0>[nX_unconstrained] sigma_p_uncon; // sd for intercept and slope unconstrained
// betas
// vector<lower=0>[K] beta; // intercept and slope hyper-priors
vector<lower=0>[nX_direct] beta_constrained; // intercept and slope hyper-priors constrained
vector[nX_unconstrained] beta_unconstrained; // unconstrained
// cholesky matrices
// cholesky_factor_corr[K] Omega; // correlation matrix
cholesky_factor_corr[nX_direct] L_con; // correlation matrix constrained
cholesky_factor_corr[nX_unconstrained] L_uncon; // correlation matrix unconstrained
// pop sigma
real<lower=0> sigma; // population sigma
}
model {
vector[N_obs] mu;
// priors
// beta ~ normal(0, 1);
beta_unconstrained ~ normal(0, 1);
beta_constrained ~ weibull(1, .1);
// Omega ~ lkj_corr(K);
L_con ~ lkj_corr_cholesky(nX_direct);
L_uncon ~ lkj_corr_cholesky(nX_unconstrained) ;
// sigma_p ~ exponential(1);
sigma_p_con ~ exponential(1);
sigma_p_uncon ~ exponential(1);
sigma ~ exponential(1);
// beta_p ~ multi_normal(beta, quad_form_diag(Omega, sigma_p));
beta_p_con ~ multi_normal_cholesky(beta_constrained, quad_form_diag(L_con, sigma_p_con)); // new corr
beta_p_uncon ~ multi_normal_cholesky(beta_unconstrained, quad_form_diag(L_uncon, sigma_p_uncon)); // new corr unconstrained
// likelihood
for (i in 1 : N_obs) {
// mu[i] = x[i] * beta_p[pid[i]]; // * is matrix multiplication in this context
mu[i] = x_con[i] * beta_p_con[pid[i]] + x_uncon[i] * beta_p_uncon[pid[i]]; // * is matrix multiplication in this context
}
y ~ normal(mu, sigma);
}
I also changed my correlation matrix to a cholesky correlarion matrix and set priors for my mlm betas (beta_p_con
and beta_p_uncon
) with multi_normal_cholesky()
. Changing from multi_normal()
to multi_normal_cholesky()
fixed my initialization issues. Additionally, I updated mu
to be the addition of the two separate matrix multiplications. Do we see glaring issues with that?
1 Like
You could simply define:
vector<lower=0>[1:nX_direct] beta_con; // intercept and slope hyper-priors constrained
vector[nX_direct:K] beta_uncon; // intercept and slope hyper-priors constrained
adding the following after the parameter block
transformed parameters {
vector[K] beta = append_row(beta_con, beta_uncon);
}
Although you use a array[] vector
structure too.
This will, if you go this road, require assignments in loop for each 1,.., N_pts
.
beta_p_con ~ multi_normal_cholesky(beta_constrained, quad_form_diag(L_con, sigma_p_con)); // new corr
And this will require a truncated multivariate-normal distribution. See here:
and here:
https://github.com/spinkney/helpful_stan_functions/blob/main/functions/distribution/multi_normal_cholesky_truncated.stanfunctions
1 Like