I’m attempting to fit a hierarchical binary logistic regression model, but have received a low E-BFMI warning for all four chains:
## E-BFMI indicated possible pathological behavior:
## Chain 1: E-BFMI = 0.128
## Chain 2: E-BFMI = 0.169
## Chain 3: E-BFMI = 0.179
## Chain 4: E-BFMI = 0.141
## E-BFMI below 0.2 indicates you may need to reparameterize your model.
I’m quite new to Stan so I’m not really sure where to start with the suggestion to reparameterize my model. (I think) I have used non-centred parameterizations across the board, as well as following the Stan User’s Guide advice to reparameterize the half-Cauchy priors on my SDs and use Cholesky factorization for my LKJ prior:
parameters {
real alpha0; // Intercept
vector[n_issues] alphaj; // Issue-specific effects
real<lower=0,upper=pi()/2> sigma_issue_unif; // Transformed SD of issue-specific effects
vector[n_fam] gamma; // Random family effects
real<lower=0,upper=pi()/2> sigma_gamma_unif; // Transformed SD of family random effects
cholesky_factor_corr[n_issues] L_Omega; // Cholesky factor of correlation
real<lower=0,upper=pi()/2> sigma_eta_unif; // Transformed SD of family-issue-specific effects
matrix[n_issues,n_fam] z;
}
transformed parameters {
real<lower=0> sigma_issue; // SD of issue-specific effects
real<lower=0> sigma_gamma; // SD of family random effects
real<lower=0> sigma_eta; // SD of family-issue-specific effects
matrix[n_issues,n_fam] eta; // Family-issue-specific effects
sigma_issue = 0.07 * tan(sigma_issue_unif);
sigma_gamma = 0.07 * tan(sigma_gamma_unif);
sigma_eta = 0.07 * tan(sigma_eta_unif);
for (n in 1:n_fam)
eta[,n] = sigma_eta * L_Omega * z[,n];
}
model {
alpha0 ~ logistic(0,1); // Logistic prior for the intercept
L_Omega ~ lkj_corr_cholesky(2); // LKJ prior on Cholesky factor
for (n in 1:n_fam)
z[,n] ~ std_normal();
alphaj ~ std_normal(); // Issue-specific effects
gamma ~ std_normal(); // Random family effects
concern ~ bernoulli_logit(alpha0 + sigma_issue*xissues*alphaj + sigma_gamma*xfam*gamma +
xfull*to_vector(eta)); // Logistic regression model
}
generated quantities {
corr_matrix[n_issues] Omega; // Correlation between issues
Omega = multiply_lower_tri_self_transpose(L_Omega);
}
The issue appears to be in the primitive SD parameters, specifically sigma_issue_unif
and sigma_eta_unif
(sigma_gamma_unif
looks fine, as do all of the other parameter vs energy__ scatterplots).
The model takes a while to run so it’s a bit difficult for me to just try things to see what sticks. Am I better off using a half-Cauchy prior directly instead of reparameterizing? Would a weaker, stronger or different prior help matters (I can’t remember where I got 0.07 from)?
Any advice on speeding up my code would be great too. I’m not sure I’m using vectorization as efficiently as I could be.