Help with lasso example in brms


#1

I normally use glmnet for variable selection tutorial here. The brms documentation says that there is a lasso function, but I am struggling to get a working example. I get the error.

Error: Defining priors for single population-level parametersis not allowed when using horseshoe or lasso priors(except for the Intercept).

Could someone show a simple working example of variables selection using lasso with brms?


#2

Please provide the code you want to get working. Also, I suggest using the horseshoe prior rather than lasso, since the former provides much better shrinkage.


#3

This is a data set from “An Introduction To Statistical Learning”. It should be reproducible and relevant to variable selection via lasso.

library(ISLR)
library(tidyverse)
library(brms)

hitters <- Hitters %>% na.omit()

for_lasso <- brm(Salary ~ .,data = hitters)

summary(for_lasso)

#5

You can set a lasso prior as follows:

for_lasso <- brm(Salary ~ .,data = hitters, prior = prior(lasso(), class = "b"))


#6

How can I be more aggressive or less aggressive with setting coefficients equal to 0? I was assuming that df was the argument for this, but maybe I am wrong. I am seeing that none of the following models have covariates getting set to 0.

Here is a more complete example:

library(ISLR)
library(tidyverse)
library(brms)

# define function to scale variables
my_scale <- function(...) as.numeric(scale(...))
hitters <- Hitters %>% 
  na.omit() %>% 
  # remove non-numaerics before scaling
  select(-NewLeague,-League ,-Division ) %>%
  # scale
  mutate_all(my_scale)


for_lasso1 <- brm(Salary ~ .,data = hitters, prior = prior(lasso(df = 1), class = "b"),
                 iter = 500, chains = 3)
for_lasso2 <- brm(Salary ~ .,data = hitters, prior = prior(lasso(df = 10), class = "b"),
                 iter = 500, chains = 3)
for_lasso3 <- brm(Salary ~ .,data = hitters, prior = prior(lasso(df = 100), class = "b"),
                 iter = 500, chains = 3)
summary(for_lasso1)
summary(for_lasso2)
summary(for_lasso3)

#7

That’s because you are in a Bayesian framework. There is no absolut shrinkage to zero. See the paper about the Bayesian lasso I cite in the doc of ?lasso.

In fact, the lasso prior is a bad shrinkage prior. I rather suggest using the horseshoe prior instead.


#8

This is the code with the horseshoe priors. After glancing at the paper it seems as if the Bayesian lasso is a compromise between lasso and ridge, but as you mentioned the coefficients don’t shrink to 0. In the paper they also used double-exponential.

What is the justification of the horseshoe prior?

Also, is it true that the smaller the df the more regularization with df = 1 being the most regularized?

   for_lasso1 <- brm(Salary ~ .,data = hitters, prior = prior(horseshoe(df = 1), class = "b"),
                     iter = 500, chains = 3)
    for_lasso2 <- brm(Salary ~ .,data = hitters, prior = prior(horseshoe(df = 10), class = "b"),
                     iter = 500, chains = 3)
    for_lasso3 <- brm(Salary ~ .,data = hitters, prior = prior(horseshoe(df = 100), class = "b"),
                     iter = 500, chains = 3)
    summary(for_lasso1)
    summary(for_lasso2)
    summary(for_lasso3)

#9

https://projecteuclid.org/euclid.ejs/1513306866

I don’t think so. I would say that the regularization is mostly due to the expected number of non-zero coefficients. Even still, you are not going to obtain exact zeros, although you can use the ideas in the projpred package to obtain a model with fewer coefficients that is expected to predict future data about as well.


#10

I’ll add to the Ben’s post, for getting coefficients equal to 0 see http://link.springer.com/article/10.1007/s11222-016-9649-y and several examples and video of projpred in https://github.com/avehtari/modelselection_tutorial


#11

And see also Betancourt’s case study comparing “lasso” prior and horseshoe https://betanalpha.github.io/assets/case_studies/bayes_sparse_regression.html