As part of demographic modeling exercise, I’ve switched from independent normal priors on 8 varying coefficients to a 8-variate normal distribution prior. Since then, I get serious problems with divergences (some chains up to 50%), much hitting maximum treepdepth (set at 10), high auto-correlation (> 0.5 for a lag of 20) and low E-BFMI (between 0.1 and 0.15). Playing around with initial values, seeds, adapt_alpha, step sizes or changing from centered to non-centered parametrization didn’t help so far. I think it is interesting that in almost all cases I run the model at last one of the chains actually will converge and gives very plausible posteriors (and, as expected very close to the posteriors of the implementation with independent normal priors).
Since other than moving from independent to dependent priors on the varying coefficients, I also changes a bit of syntax, I am wondering if you can spot any immediate problems or inefficiencies in my code?
Some background:
- The demographic model actually has 4 parameters (let’s call them “demographic parameters”), but these are fit separately by subgroup in each country/cluster. Thus we arrive at 8 parameters per cluster in my model.
- There are 75 clusters/countries, and about 400 to 800 observations for each (~50,000 in total).
My personal best guess of the potential problem:
The positive correlations in the 8x8 correlation matrix I expect are quite extreme, especially between the same “demographic parameter” for differnet subgroups (these cells should have > 0.7). Then again, I expect negative correlation between the different “demographic parameters” of the same country and subgroup because they can partially compensate each other in the demographic process. So, I have a lot of prior knowledge on the covariance structure (going back to the independent priors therefore also seems undesirable) but cannot really put it into the LKJ prior. Apparently, the “non-informative” LKJ(1) cannot deal with this situation well?
data {
int<lower=0> N; // Number of data points
int<lower=1> K; // Number of clusters
array[N] int<lower=0> x_chd; // Observed data
array[N] int<lower=18, upper=49> x_age; // Vector of ages
array[N] int<lower=0, upper=1> x_geq_h; // Dummies for subgroup
array[N] int<lower=0, upper=1> x_geq_l;
array[N] int<lower=1, upper=K> x_cluster; // Vector of clusters
}
transformed data{
int max_age = max(x_age);
}
parameters {
// country-specific parameters
array[K] vector[8] alpha;
// population average parameters
vector[8] alpha_0;
// Cholesky factor for the correlation matrix
cholesky_factor_corr[8] L_Omega;
// SD among countries of each parameter
vector<lower=0>[8] L_std;
}
transformed parameters{
array[K] real<lower=0> c1_h, c1_l, mu_h, mu_l, sigma1_h, sigma1_l, sigma2_h, sigma2_l;
array[K, max_age] real<lower=0> f_h, f_l, lambda_h, lambda_l;
// obtain the 8 demographic parameters from underlying components on log scale
for(k in 1:K){
c1_h[k] = exp(alpha[k][1] + alpha_0[1]);
c1_l[k] = exp(alpha[k][2] + alpha_0[2]);
mu_h[k] = exp(alpha[k][3] + alpha_0[3]);
mu_l[k] = exp(alpha[k][4] + alpha_0[4]);
sigma1_h[k] = exp(alpha[k][5] + alpha_0[5]);
sigma1_l[k] = exp(alpha[k][6] + alpha_0[6]);
sigma2_h[k] = exp(alpha[k][7] + alpha_0[7]);
sigma2_l[k] = exp(alpha[k][8] + alpha_0[8]);
// compute age-dependent means
for(j in 1:max_age){
f_h[k,j] = c1_h[k] * exp(-pow((j - mu_h[k]) / ((j <= mu_h[k]) * sigma1_h[k] + (j > mu_h[k]) * sigma2_h[k]), 2));
f_l[k,j] = c1_l[k] * exp(-pow((j - mu_l[k]) / ((j <= mu_l[k]) * sigma1_l[k] + (j > mu_l[k]) * sigma2_l[k]), 2));
}
// sum up age-dependent means to obtain expectation parameter for the poisson likelihood
lambda_h[k,] = cumulative_sum(f_h[k,]);
lambda_l[k,] = cumulative_sum(f_l[k,]);
}
}
model {
// likelihood
for (i in 1:N)
x_chd[i] ~ poisson(lambda_h[x_cluster[i], x_age[i]] * x_geq_h[i] + lambda_l[x_cluster[i], x_age[i]] * x_geq_l[i]);
// prior on population average paremeters on log-scale informed by prior literature
alpha_0[1:2] ~ normal(log(0.1), 1);
alpha_0[3:4] ~ normal(log(27.5), 1);
alpha_0[5:8] ~ normal(log(7.5), 1);
// Prior on Cholesky decomposition of correlation matrix
L_Omega ~ lkj_corr_cholesky(1);
// Prior on standard deviations for each parameter
L_std ~ normal(0, 1);
alpha ~ multi_normal_cholesky(rep_vector(0,8), diag_pre_multiply(L_std, L_Omega));
}
generated quantities {
// correlation matrix
matrix[8,8] Omega;
Omega = multiply_lower_tri_self_transpose(L_Omega);
}