Help on specifying multi-level sparse model

schen5 · March 21, 2018, 6:30pm

Hello. I’m trying to specify a regression model with sparsity priors for coefficients. Since there are five groups in the data, I would like to fit a hierarchical model, and found this discussion on the old mailing list.

Model:
Let’s say I have N samples with D covariates, and there are K groups for samples.
For each sample n, the group ID[n] =k (k = 1,..., K) is known.
I want to do something like this:
Y_n = \sum_jX_{n,j} \beta_{j, ID[n]} + m_{\beta}.

From the discussion in the above link, seems that a reasonable way to do this is (using Laplace prior as an example):
\mu_{\beta_j} \sim \text{DoubleExponential}(0,1),
\beta_{j, k} \sim \text{Normal}(\mu_{\beta_j}, \sigma_j).

For non-centered parameterization:
\mu_{\beta_j} \sim \text{DoubleExponential}(0,1),
\beta_{j, k} = \mu_{\beta_j} + \sigma_j \delta_{j,k},
where \delta_{j,k} \sim \text{Normal}(0,1) and \sigma_j \sim \text{Cauchy}^+(0,1).

Problem:
This specification gives all \mu_{\beta_j} (j = 1,...,J) aligning perfectly at zero (\beta_{j,k} are not, and they actually look okay). I changed the parameter in double exponential (to a weaker shrinkage) but didn’t change the estimation. Moreover, pareto-k diagnostic indicates many of them are > 0.7 (which didn’t happen for non-hierarchical model).

Question:

Since I was not expecting all zero shrinkage, should I use even weaker prior for \mu_{\beta_j} (already tried \text{DoubleExponential}(0,10))?
As an alternative, I am thinking to replace \sigma_j \sim \text{Cauchy}^+(0,1) with \sigma_j \sim \text{DoubleExponential}^+(0,1), so that it is not that easy for \beta_{j,k} to escape shrinkage when \mu_{\beta_j} is zero. Not sure if it makes any sense.

Any insights would be highly appreciated. Thank you !

Bob_Carpenter · March 25, 2018, 8:23pm

This happens in multilevel models if you do optimization. The overall log density approaches infinity as the hierarchical variance approaches zero and the lower-level parameters approach the prior means.

Rather than weaker prior, what you need is something that avoids zeros. Andrew and others have written about this.

Or you can fit with full Bayes, in which case, things shouldn’t be collapsing to zero.

Juho Piironen and Aki Vehtari just put out a paper on shrinkage and sparsity-inducing priors.

Topic		Replies	Views
Posterior variance of hyperparameters in multilevel model with hard constraints Modeling	2	794	August 27, 2018
Specification of multivariate hierarchical priors Modeling specification	1	507	June 9, 2019
Parametrization for 6 non-independent variables in (0,1) and (0,∞) Modeling techniques	3	460	August 4, 2022
Error specifying multivariate normal prior Modeling	1	1466	September 15, 2017
Hierarchical multinomial model with sparse data Modeling	3	448	October 20, 2022

Help on specifying multi-level sparse model

Related topics