Sparsity information and regularization in the horseshoe and other shrinkage priors

We (with Juho Piironen) have a new paper in arXiv
"Sparsity information and regularization in the horseshoe and other shrinkage priors" https://arxiv.org/abs/1707.01694

It’s an extension of our previous paper on how to choose hyperparameters for horseshoe prior http://proceedings.mlr.press/v54/piironen17a.html. New thing in this paper is the regularized horseshoe (and more extensive experiments and discussion). From the abstract “Moreover, we introduce a generalization to the horseshoe prior, called the regularized horseshoe, that allows us to specify a minimum level of regularization to the largest values. We show that the new prior can be considered as the continuous counterpart of the spike-and-slab prior with a finite slab width, whereas the original horseshoe resembles the spike-and-slab with an infinitely wide slab.” The regularized horseshoe is especially useful in logistic regression when n<<p to control the largest weights and it also helps to get rid of convergence problems common with the regular horseshoe.

The paper has example Stan code in the end, and the we have added an issue for rstanarm.

Aki

Aki, I enjoy the paper, especially that it comes with Stan code is awesome. The graphic on pg. 24 shows that
GLMNET seem to be really competitive, except for the Leukemia case. I was wondering, if you used cross-validation for GLMNET? The paper just says “default”-settings. … you could just have picked lambda-min from these estimations.
Then the results would become even more close.

Thanks!

The older shorter version of the paper On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior has a figure with uncertainty intervals showing that the difference is significant (see also Fig 5 for regression results). Since in this version the figure has more lines, Juho removed the intervals although I suspected that then someone might infer that glmnet is competitive… I think we need to consider how to add the uncertainties back to the figure.

Page 23 “To get a baseline for the comparisons, we also computed the
prediction accuracies to Lasso with the regularization parameter tuned by 10-
fold cross-validation.”

And naturally the benefit of Stan is that HS and RHS priors can be used as a part of more complex models and with any likelihoods (while glmnet seems to support only linear regression, logistic and multinomial regression models, Poisson regression and the Cox model).

Aki

Page 23 “To get a baseline for the comparisons, we also computed the
prediction accuracies to Lasso with the regularization parameter tuned by 10-
fold cross-validation.”

I have to forgive to my brain. I got confused, because on pg. 24 it shows a constant black dashed
line reflecting the Lasso. And when using cross-validation, GLMNET gets a sequence of lambda
out, each different likelihood MLPD.
But then another silly question maybe: in what way tau0 in fig. on pg 24 compares to GLMNET?
I would understand, if there would be the number of “non-zero” (close to eps intervall around 0)
parameter instead of tau0?

We have that kind of figures, too, demonstrating that hs+projpred gives better performance with smaller models than lasso+cv.

I’m not sure if I understood the question.
We could have p_0 in x axis instead. Relation between p_0 and tau0 is given by eq 3.13

Aki