I have a large hierarchical regression stan model that essentially integrates four data sets to have a partially pooled regression across 7 time points. I’m integrating two large public opinion panel surveys with dropouts, different sampling structures, dates of administrations, distance between waves of survey, but the same questionnaire. This is real life data and I can’t get more because the survey was in 2016. Eventually, I will poststratify the data as it is survey data that isn’t completely representative and the population parameters are of interest. I keep getting divergences in the model. I have 5 chains with 9000 iterations, 2500 warmup, adapt_delta is .99, and tree_depth is 15. So I have 37500 post warmup. I split the data into testing and training sets.
I have a model that build 32 covariance matrices per data set (because the data is only normal within demographic groups) on the repeated measures and that model has 1521 divergent transitions and a simpler model with one covariance matrix that has 988 divergent transitions. The divergences occur throughout the chain, so a longer warmup would not fix it. If you count each element of the covariance matrix as one parameter, there are 2978 parameters so this is a very large model. I know divergences are a red flag that the model had some issues but I’m not sure those issues are biasing the regression coefficients. The model is complicated because most of the data is repeated measurements. The repeated measures are obviously correlated but it is hard to estimate the covariance matrices. I haven’t tried a non-centered parameterization but I don’t know how else I could fix it. The problem is likely estimating the covariance matrices some of which have very few data points and I’m not sure a non-centered parameterization would fix the divergence likely relating to the covariance matrices. Should I try a different prior on the covariance matrices so that I would have better performance when there are few or no data points to estimate that covariance? I’ve tried LKJ(1) and LKJ(2) and LKJ(2) performed better.
I think this might be a situation where there isn’t perfect model for the data that would not diverge. I’ve tried a cauchy prior on the variances but that diverged more. I’ve also played with all the hyperparameters of the priors. So far it appears that the effects on the regression coefficients are minimal because the pairs() plots say the draws for the coefficients for the divergent transitions don’t look any different than the non-divergent transitions. Plots of the residuals on the testing dataset look normal with homogenous variances which is exactly what they should be. There is minimal differences between the residuals of the testing data set and training data set.
I’ve considered simulating data from this model but I’m a little hesitant to do so because this model is complicated and takes days to run.
I will actually fit this model 16 times so I want to be sure what I have is good before I start running all 16 versions. Basically, I want a second opinion on whether the divergences make the model invalid if the fits of the parameters of interest seem to be unaffected and unbiased. Also this model performs better (lower residuals) then fitting single regression models at each time point which I think is another sign the model is working.
Below is the code:
// The input data is a vector 'y' of length 'N'.
data {
int<lower=0> N; // total sample size
int<lower=1> D; // dimension of non-demographic covariates
int<lower=1> K; // number of demographic covariates
int<lower=1> T; //number of time points
//vector[N] S; // survey group, S = 1 for T1 only, S = 2 for T1+T7, S=3 for T2-T7, S=4 T7 only
//vector[N] C; // covariance matrix group
//matrix[T, N] y; // response matrix
//matrix[D,T] x[N]; // data matrix
//group 1
int<lower=1> N_g1; // sample size group 1
vector[D] x_g1[N_g1]; // covariates group 1
vector[K] cvec_g1[N_g1]; // demographic covariates group 1
real y_g1[N_g1]; // response group 1
//group 2
int<lower=1> N_g2; // sample size group 2
matrix[D,2] x_g2[N_g2]; // covariates group 2
vector[K] cvec_g2[N_g2]; // demographic covariates group 2
vector[2] y_g2[N_g2]; // response group 2
//group 3
int<lower=1> N_g3; // sample size group 3
matrix[D,6] x_g3[N_g3]; // covariates group 3
vector[K] cvec_g3[N_g3]; // demographic covariates group 3
vector[6] y_g3[N_g3]; // response group 3
// group 4
int<lower=1> N_g4; // sample size group 4
vector[D] x_g4[N_g4]; // covariates group 4
vector[K] cvec_g4[N_g4]; // demographic covariates group 4
real y_g4[N_g4]; // response group 4
// vector of demographic group
int<lower=1> C_g1[N_g1];
int<lower=1> C_g2[N_g2];
int<lower=1> C_g3[N_g3];
int<lower=1> C_g4[N_g4];
}
// The parameters accepted by the model. Our model
parameters {
vector[K] alpha; // incenterepts
matrix [D,T] beta; // regresion coefficients
vector[D] mu_d; // mean of beta
vector<lower=0>[K] sigma_alpha; //variance of demographic coefficients
vector<lower=0>[D] sigma_beta; //variance of non-demographic coefficients
cholesky_factor_corr[2] group2cor[32]; // covariance of matrix for people observed 2
cholesky_factor_corr[6] group3cor[32]; // covariance of matrix for people observed 6
vector<lower=0>[2] sigmag2[32];
vector<lower=0>[6] sigmag3[32];
real<lower=0> group1var[32];
real<lower=0> group4var[32];
//real mu;
//real<lower=0> sigma;
}
transformed parameters{
//int my_subset[2] = {1,7};
cholesky_factor_cov[2] covg2[32];
cholesky_factor_cov[6] covg3[32];
for(k in 1:32) covg2[k] = diag_pre_multiply(sigmag2[k],group2cor[k]);
for(k in 1:32) covg3[k] = diag_pre_multiply(sigmag3[k],group3cor[k]);
}
// The model to be estimated. We model the output
// 'y' to be normally distributed with mean 'mu'
// and standard deviation 'sigma'.
model {
//y ~ normal(mu, sigma);
for(i in 1:K){
sigma_alpha[i] ~ normal(0,1);
}
for(i in 1:D){
sigma_beta[i] ~ normal(0,1);
}
for(i in 1:K){
alpha[i] ~ normal(0, sigma_alpha[i]);
}
for(i in 1:D){
mu_d[i] ~ normal(0, 1);
beta[i,:] ~ normal(mu_d[i], sigma_beta[i]);
}
for(i in 1:32){
group2cor[i] ~ lkj_corr_cholesky(2);
group3cor[i] ~ lkj_corr_cholesky(2);
sigmag2[i] ~ normal(0,1);
sigmag3[i] ~ normal(0,1);
group1var[i] ~ normal(0,1);
group4var[i] ~ normal(0,1);
}
for(i in 1:N_g1){
int c = C_g1[i];
y_g1[i] ~ normal(dot_product(cvec_g1[i], alpha)+dot_product(x_g1[i],beta[:,1]), group1var[c]);
}
for(i in 1:N_g2){
int c = C_g2[i];
vector[2] mu2;
mu2[1] = dot_product(cvec_g2[i], alpha)+x_g2[i][:,1]'*beta[:, 1];
mu2[2] = dot_product(cvec_g2[i], alpha)+x_g2[i][:,2]'*beta[:, 7];
y_g2[i] ~ multi_normal_cholesky(mu2, covg2[c]);
}
for(i in 1:N_g3){
int c = C_g3[i];
vector[6] mu3;
mu3[1] = dot_product(cvec_g3[i], alpha)+x_g3[i][:,1]'*beta[:, 2];
mu3[2] = dot_product(cvec_g3[i], alpha)+x_g3[i][:,2]'*beta[:, 3];
mu3[3] = dot_product(cvec_g3[i], alpha)+x_g3[i][:,3]'*beta[:, 4];
mu3[4] = dot_product(cvec_g3[i], alpha)+x_g3[i][:,4]'*beta[:, 5];
mu3[5] = dot_product(cvec_g3[i], alpha)+x_g3[i][:,5]'*beta[:,6];
mu3[6] = dot_product(cvec_g3[i], alpha)+x_g3[i][:,6]'*beta[:,7];
y_g3[i] ~ multi_normal_cholesky(mu3, covg3[c]);
}
for(i in 1:N_g4){
int c = C_g4[i];
y_g4[i] ~ normal(dot_product(cvec_g4[i], alpha)+dot_product(x_g4[i],beta[:,7]), group4var[c]);
}
}