Understanding Regularized HS prior

Joseph_Dean · March 11, 2019, 3:15am

I am struggling to understand the various arguments of the hs() prior in rstanarm. I read through Piironen, Vehtari 2017 but I’m still struggling with some things. The calculation of tau0, or the global intercept, makes sense. But I’m having trouble with the others hyperparameters, though I find adjusting them does affect my predictive accuracy. I find the term “slab” particularly confusing. What would be nice is a sense of how the various hyperparameters affects the model. For example, one argument is simply ‘df’. I am unsure what exactly this refers to as there is also a global_df and a slab_df IIRC. If I do a prior_summary call to my model in rstanarm, this df appears as the prior to coefficients. I’m working on a n<<p problem and finding many of my coefficients have very heavily skewed distributions, though I don’t seem to be having any sampling issues. It makes me suspicious some of my priors are reigning the estimates in too much but perhaps I’m misdiagnosising the issue.

avehtari · March 14, 2019, 7:49pm

hs refers to hierarchical shrinkage, and to get regularized horseshoe set df=1, global_df=1. If you want sparsifying prior, it’s best to leave them like that. Then you need to choose global_scale, slab_df, and slab_scale.

That is the global_scale and it seems you have that figured out.

slab describes the prior for the large coefficients. slab is t-distribution and has scale and df, which you can choose based on your prior information about the magnitude of large coefficients. df is local df or local nu. slab is important, for example, in logistic regression with separable classes as without regularization horseshoe would have way too much mass for unfeasible large weights.

and which likelihood?

posterior distributions? It’s fine if the posterior is skewed.

kcar · May 29, 2019, 10:51am

I had a similar problem with understanding the hs() prior. When global_df =1 , does that mean that \tau \sim C^+(0,\tau_0^2)? And how is df related to the model as in Piironen, Vehtari (2017)?

federico · May 29, 2019, 2:43pm

It should be that hs() is using half student_t priors. The student_t distribution with df=1 is equivalent to a Cauchy distribution, that is the reason of setting df=1 and global_df=1.

kcar · May 29, 2019, 4:06pm

Yes i think that as well, however I can’t match the 5 inputs of hs() with the parameters of the regularised horseshoe. I assume hs() should match the following model:

c^2 \sim \text{inv-gamma}(\frac{\nu}{2},\frac{\nu s^2}{2} )

Where \nu is slab_df and s is slab_scale
\lambda_i \sim C^+(0,1)

\hat\lambda_i^2 = \frac{c^2\lambda_i^2}{c^2 + \tau^2\lambda_i^2}

\tau \sim \text{student-t}_{\nu-global}^+(0, \tau_0^2)

Where \nu_{global} is global_df and \tau_0^2 is global_scale.

\beta_i \sim N(0,\tau\hat\lambda_i)
So that are 4 parameters in total and not 5 as in hs()

federico · May 30, 2019, 5:40pm

As far as I know:

global_scale and global_df are for \tau (student-t distributed)
slab_df and slab_scale are for c^{2} (not sure how the inv-gamma is parametrised)
df refers to the degree of freedom for \lambda_{i} which is still distributed as a student-t. For (regularised) horseshoe you want it to be df=1 in order to have a Cauchy distribution

I suggest you to look at the Stan implementation in Section C.1 (Appendix C) of Piironen, Vehtari (2017). It might not correspond exactly to the actual implementation in rstanarm though.

avehtari · May 30, 2019, 8:34pm

hs() refers to Hierarchical Shrinkage (not HorseShoe). Hierarchical Shrinkage has
\lambda_i \sim t_\nu^+(0,1)
Thus five parameters of hs() define regularized hierarchical shrinkage prior, but if df=1 it defines regularized horseshoe. See [1508.02502] Projection predictive variable selection using Stan+R and Sparsity information and regularization in the horseshoe and other shrinkage priors for more information.

Topic		Replies	Views
Spike and slab Modeling	4	4640	May 18, 2020
Mix shrinkage and normal priors for rstanarm regression coefficients rstanarm	8	1289	September 14, 2018
Bayesian logistic regression with horseshoe prior Modeling	2	1590	January 30, 2019
Advice on slab width choice for regularized horseshoe Modeling	1	442	November 1, 2022
Formulas related to generalized linear model and horseshoe prior General	1	404	May 9, 2022

Understanding Regularized HS prior

Related topics