I have a set of individual-specific parameters in a hierarchical model, specifically \alpha_{i,1}, \alpha_{i, 2}, \beta_{i,1}, \beta_{i,2} \in [0,1], x_{i,1}, x_{i, 2} > 0. These are not observed directly, but play a role in the likelihood of individual data.
I am struggling to find a good hierarchical prior for this, for the following reason: while the \alpha's and x's are interior, \beta's are interior for some individuals while they are more consistent with \beta \approx 0 for some individuals. All four variations occur in the data: \beta_{i,1} \approx 0 while \beta_{i,2} interior, etc.
The way I am parametrizing it now is taking a multivariate normal distribution, getting \alpha, \beta's from a logit
transformation while x's from a log
(excerpt from stan code follows, the whole model is a 500-line monster that no one would read). You can see why this is problematic: it cannot deal with mass around 0 in \beta's.
This is giving me a lot of convergence issues, ie maxed out treedepth and divergences. Suggestions would be appreciated: how would you model this?
Stan code excerpt
functions {
/* logit(x) transformation and log Jacobian correction for parameters.
*/
real logit_lp(real x) {
target += -log(x) - log1m(x);
return logit(x);
}
/* Cross sectional distribution of individual-specific parameters.
Transformed parameters ~ multi_normal_cholesky(cs_mean, cs_L)
xs > 0, 0 < alphas, betas < 1.
*/
void cross_sectional_distribution_lp(vector xs, real alpha1, real alpha2, real beta1, real beta2,
vector cross_section_mean,
matrix cross_section_L) {
vector[6] z;
z[1] = logit_lp(alpha1);
z[2] = logit_lp(beta1);
z[3] = log(xs[1]);
z[4] = logit_lp(alpha2);
z[5] = logit_lp(beta2);
z[6] = log(xs[2]);
target += -z[3] - z[6]; // log Jacobian adjustment for y = log(x)
z ~ multi_normal_cholesky(cross_section_mean, cross_section_L);
}
...
}
parameters {
/* cross sectional distributions
*/
vector[6] cross_section_intercept; // intercept
vector<lower=0>[6] cross_section_std; // standard deviation
cholesky_factor_corr[6] cross_section_corr_factor; // correlations (as Cholesky decomposition)
...
}
model {
matrix[6, 6] cross_section_L; // Cholesky factor of cross-sectional variance
cross_section_intercept ~ normal(0, 5);
cross_section_std ~ student_t(7, 0, 5);
cross_section_corr_factor ~ lkj_corr_cholesky(2.0);
/* initialize common variables
*/
cross_section_L = diag_pre_multiply(cross_section_std,
cross_section_corr_factor);
for (i in 1:N) {
cross_sectional_distribution_lp(xs[j], alpha1[j], alpha2[j], beta1[j], beta2[j],
cross_section_intercept, cross_section_L);
}
...
}