Parametrization for 6 non-independent variables in (0,1) and (0,∞)

tpapp · August 2, 2022, 2:08pm

I have a set of individual-specific parameters in a hierarchical model, specifically \alpha_{i,1}, \alpha_{i, 2}, \beta_{i,1}, \beta_{i,2} \in [0,1], x_{i,1}, x_{i, 2} > 0. These are not observed directly, but play a role in the likelihood of individual data.

I am struggling to find a good hierarchical prior for this, for the following reason: while the \alpha's and x's are interior, \beta's are interior for some individuals while they are more consistent with \beta \approx 0 for some individuals. All four variations occur in the data: \beta_{i,1} \approx 0 while \beta_{i,2} interior, etc.

The way I am parametrizing it now is taking a multivariate normal distribution, getting \alpha, \beta's from a logit transformation while x's from a log (excerpt from stan code follows, the whole model is a 500-line monster that no one would read). You can see why this is problematic: it cannot deal with mass around 0 in \beta's.

This is giving me a lot of convergence issues, ie maxed out treedepth and divergences. Suggestions would be appreciated: how would you model this?

Stan code excerpt

functions {
  /* logit(x) transformation and log Jacobian correction for parameters.
   */
  real logit_lp(real x) {
    target += -log(x) - log1m(x);
    return logit(x);
  }

  /* Cross sectional distribution of individual-specific parameters.
     
     Transformed parameters ~ multi_normal_cholesky(cs_mean, cs_L)
     xs > 0, 0 < alphas, betas < 1.
   */
  void cross_sectional_distribution_lp(vector xs, real alpha1, real alpha2, real beta1, real beta2,
                                       vector cross_section_mean,
                                       matrix cross_section_L) {
    vector[6] z;
    z[1] = logit_lp(alpha1);
    z[2] = logit_lp(beta1);
    z[3] = log(xs[1]);
    z[4] = logit_lp(alpha2);
    z[5] = logit_lp(beta2);
    z[6] = log(xs[2]);
    target += -z[3] - z[6]; // log Jacobian adjustment for y = log(x)
    z ~ multi_normal_cholesky(cross_section_mean, cross_section_L);
  }
  ...
}
parameters {
  /* cross sectional distributions
   */
  vector[6] cross_section_intercept;    // intercept
  vector<lower=0>[6] cross_section_std; // standard deviation
  cholesky_factor_corr[6] cross_section_corr_factor; // correlations (as Cholesky decomposition)
  ...
}
model {
  matrix[6, 6] cross_section_L; // Cholesky factor of cross-sectional variance
  cross_section_intercept ~ normal(0, 5);
  cross_section_std ~ student_t(7, 0, 5);
  cross_section_corr_factor ~ lkj_corr_cholesky(2.0);
  
  /* initialize common variables
   */
  cross_section_L = diag_pre_multiply(cross_section_std,
                                      cross_section_corr_factor);
  for (i in 1:N) {
    cross_sectional_distribution_lp(xs[j], alpha1[j], alpha2[j], beta1[j], beta2[j],
                                    cross_section_intercept, cross_section_L);
  }
  ...
}

tpapp · August 2, 2022, 2:17pm

Corresponding KDE (multimodes could be convergence issues, or maybe that’s how the data looks):

Bob_Carpenter · August 2, 2022, 4:23pm

If the values really are constrained to be positive, but can also be zero, then you run into trouble with the usual parameterizations because it pushes mass out to -infinity on the unconstrained scale. If you don’t want zeros, you can introduce a zero-avoiding prior, but these usually have to be pretty strong. And they’re hard to mix in with the multi-normal.

[Edit: ~~I didn’t understand why the cross_sectional_distribution_lp code adjusts for log but not for logit~~. Oops, saw the adjustment in the other function]. I think it would be much easier to work the other way around. Define the distribution on unconstrained parameters then constrain in the code. That way you don’t need any Jacobians (which really get doubled because we implement them for the original constraints).

The other natural approach would be zero inflation. The marginalizations will be tractable because they’re individual level. You might also try more weakly informative priors for cross_section_sd unless you really expect large values there.

The code would be a lot faster if you pushed the loop into cross_sectional_distribution_lp and vectorized the multi-normal Cholesky.

tpapp · August 4, 2022, 9:53am

Thanks, I went with this and got rid of most of the divergences and maxed out treedepth.

The model is still not mixing super-well but that may be a separate issue, which I will investigate with gradually building up the same model using simulated data.

For this model that is not a good fit, but I will keep this in mind for future models.

Topic		Replies	Views
Hierarchical Model - Hyper priors for a Gamma Distribution Modeling	4	1721	March 12, 2020
Specification of multivariate hierarchical priors Modeling specification	1	513	June 9, 2019
Hierarchical model with probability parameter Modeling	4	1051	February 4, 2018
Hierarchical logistic regression with correlated posterior parameters Modeling rstan	3	529	August 9, 2022
Choosing correct non-centered parametrization Modeling techniques , specification	9	776	October 2, 2020

Parametrization for 6 non-independent variables in (0,1) and (0,∞)

Related topics