Hello. I’m trying to specify a regression model with sparsity priors for coefficients. Since there are five groups in the data, I would like to fit a hierarchical model, and found this discussion on the old mailing list.

**Model:**

Let’s say I have N samples with D covariates, and there are K groups for samples.

For each sample n, the group ID[n] =k (k = 1,..., K) is known.

I want to do something like this:

Y_n = \sum_jX_{n,j} \beta_{j, ID[n]} + m_{\beta}.

From the discussion in the above link, seems that a reasonable way to do this is (using Laplace prior as an example):

\mu_{\beta_j} \sim \text{DoubleExponential}(0,1),

\beta_{j, k} \sim \text{Normal}(\mu_{\beta_j}, \sigma_j).

For non-centered parameterization:

\mu_{\beta_j} \sim \text{DoubleExponential}(0,1),

\beta_{j, k} = \mu_{\beta_j} + \sigma_j \delta_{j,k},

where \delta_{j,k} \sim \text{Normal}(0,1) and \sigma_j \sim \text{Cauchy}^+(0,1).

**Problem:**

This specification gives all \mu_{\beta_j} (j = 1,...,J) aligning perfectly at zero (\beta_{j,k} are not, and they actually look okay). I changed the parameter in double exponential (to a weaker shrinkage) but didn’t change the estimation. Moreover, pareto-k diagnostic indicates many of them are > 0.7 (which didn’t happen for non-hierarchical model).

**Question:**

- Since I was not expecting all zero shrinkage, should I use even weaker prior for \mu_{\beta_j} (already tried \text{DoubleExponential}(0,10))?
- As an alternative, I am thinking to replace \sigma_j \sim \text{Cauchy}^+(0,1) with \sigma_j \sim \text{DoubleExponential}^+(0,1), so that it is not that easy for \beta_{j,k} to escape shrinkage when \mu_{\beta_j} is zero. Not sure if it makes any sense.

Any insights would be highly appreciated. Thank you !