I normally use glmnet
for variable selection tutorial here. The brms documentation says that there is a lasso
function, but I am struggling to get a working example. I get the error.
Error: Defining priors for single population-level parametersis not allowed when using horseshoe or lasso priors(except for the Intercept).
Could someone show a simple working example of variables selection using lasso
with brms
?
Please provide the code you want to get working. Also, I suggest using the horseshoe
prior rather than lasso
, since the former provides much better shrinkage.
This is a data set from “An Introduction To Statistical Learning”. It should be reproducible and relevant to variable selection via lasso.
library(ISLR)
library(tidyverse)
library(brms)
hitters <- Hitters %>% na.omit()
for_lasso <- brm(Salary ~ .,data = hitters)
summary(for_lasso)
You can set a lasso prior as follows:
for_lasso <- brm(Salary ~ .,data = hitters, prior = prior(lasso(), class = "b"))
How can I be more aggressive or less aggressive with setting coefficients equal to 0? I was assuming that df
was the argument for this, but maybe I am wrong. I am seeing that none of the following models have covariates getting set to 0.
Here is a more complete example:
library(ISLR)
library(tidyverse)
library(brms)
# define function to scale variables
my_scale <- function(...) as.numeric(scale(...))
hitters <- Hitters %>%
na.omit() %>%
# remove non-numaerics before scaling
select(-NewLeague,-League ,-Division ) %>%
# scale
mutate_all(my_scale)
for_lasso1 <- brm(Salary ~ .,data = hitters, prior = prior(lasso(df = 1), class = "b"),
iter = 500, chains = 3)
for_lasso2 <- brm(Salary ~ .,data = hitters, prior = prior(lasso(df = 10), class = "b"),
iter = 500, chains = 3)
for_lasso3 <- brm(Salary ~ .,data = hitters, prior = prior(lasso(df = 100), class = "b"),
iter = 500, chains = 3)
summary(for_lasso1)
summary(for_lasso2)
summary(for_lasso3)
That’s because you are in a Bayesian framework. There is no absolut shrinkage to zero. See the paper about the Bayesian lasso I cite in the doc of ?lasso.
In fact, the lasso prior is a bad shrinkage prior. I rather suggest using the horseshoe prior instead.
2 Likes
This is the code with the horseshoe priors. After glancing at the paper it seems as if the Bayesian lasso is a compromise between lasso and ridge, but as you mentioned the coefficients don’t shrink to 0. In the paper they also used double-exponential.
What is the justification of the horseshoe prior?
Also, is it true that the smaller the df
the more regularization with df = 1
being the most regularized?
for_lasso1 <- brm(Salary ~ .,data = hitters, prior = prior(horseshoe(df = 1), class = "b"),
iter = 500, chains = 3)
for_lasso2 <- brm(Salary ~ .,data = hitters, prior = prior(horseshoe(df = 10), class = "b"),
iter = 500, chains = 3)
for_lasso3 <- brm(Salary ~ .,data = hitters, prior = prior(horseshoe(df = 100), class = "b"),
iter = 500, chains = 3)
summary(for_lasso1)
summary(for_lasso2)
summary(for_lasso3)
https://projecteuclid.org/euclid.ejs/1513306866
I don’t think so. I would say that the regularization is mostly due to the expected number of non-zero coefficients. Even still, you are not going to obtain exact zeros, although you can use the ideas in the projpred package to obtain a model with fewer coefficients that is expected to predict future data about as well.
2 Likes
I’ll add to the Ben’s post, for getting coefficients equal to 0 see http://link.springer.com/article/10.1007/s11222-016-9649-y and several examples and video of projpred in https://github.com/avehtari/modelselection_tutorial
2 Likes
And see also Betancourt’s case study comparing “lasso” prior and horseshoe https://betanalpha.github.io/assets/case_studies/bayes_sparse_regression.html
1 Like