Help on specifying multi-level sparse model

Hello. I’m trying to specify a regression model with sparsity priors for coefficients. Since there are five groups in the data, I would like to fit a hierarchical model, and found this discussion on the old mailing list.

Let’s say I have N samples with D covariates, and there are K groups for samples.
For each sample n, the group ID[n] =k (k = 1,..., K) is known.
I want to do something like this:
Y_n = \sum_jX_{n,j} \beta_{j, ID[n]} + m_{\beta}.

From the discussion in the above link, seems that a reasonable way to do this is (using Laplace prior as an example):
\mu_{\beta_j} \sim \text{DoubleExponential}(0,1),
\beta_{j, k} \sim \text{Normal}(\mu_{\beta_j}, \sigma_j).

For non-centered parameterization:
\mu_{\beta_j} \sim \text{DoubleExponential}(0,1),
\beta_{j, k} = \mu_{\beta_j} + \sigma_j \delta_{j,k},
where \delta_{j,k} \sim \text{Normal}(0,1) and \sigma_j \sim \text{Cauchy}^+(0,1).

This specification gives all \mu_{\beta_j} (j = 1,...,J) aligning perfectly at zero (\beta_{j,k} are not, and they actually look okay). I changed the parameter in double exponential (to a weaker shrinkage) but didn’t change the estimation. Moreover, pareto-k diagnostic indicates many of them are > 0.7 (which didn’t happen for non-hierarchical model).


  1. Since I was not expecting all zero shrinkage, should I use even weaker prior for \mu_{\beta_j} (already tried \text{DoubleExponential}(0,10))?
  2. As an alternative, I am thinking to replace \sigma_j \sim \text{Cauchy}^+(0,1) with \sigma_j \sim \text{DoubleExponential}^+(0,1), so that it is not that easy for \beta_{j,k} to escape shrinkage when \mu_{\beta_j} is zero. Not sure if it makes any sense.

Any insights would be highly appreciated. Thank you !

This happens in multilevel models if you do optimization. The overall log density approaches infinity as the hierarchical variance approaches zero and the lower-level parameters approach the prior means.

Rather than weaker prior, what you need is something that avoids zeros. Andrew and others have written about this.

Or you can fit with full Bayes, in which case, things shouldn’t be collapsing to zero.

Juho Piironen and Aki Vehtari just put out a paper on shrinkage and sparsity-inducing priors.

1 Like