Different results using classical LASSO and Bayesian LASSO

Hi everyone,

This question is more or less a theoretical question. So I am not sure whether it is suitable to post this here. Anyway it is great if I can get the opinion of the subject experts regarding this matter.

I have fitted a LASSO logistic regression model using classical approach and Bayesian approach and the results are different. I am curious about this because, the Bayesian estimates using double exponential priors similar to the classical LASSO estimates.

Here are the results based on caret package (which internally use glmnet package) in R. I used 10 fold cross validation to find the optimal lambda.
image

Here are the results based on Bayesian lasso by taking double exponential priors using my own code.

In fact, there seems to be no bug in my code as I obtained similar results based on brms package.

I also tried the horseshoe prior in brms package. But I got divergent transitions.

So is it possible to explain the difference in results ?

Thank you in advance.

I don’t know the details, but here’s a blog post describing how the Bayesian Lasso doesn’t really work: https://statmodeling.stat.columbia.edu/2017/11/02/king-must-die/

It might give you some leads on where to look. I’m not sure how the argument goes myself.

I think give this another swing. There are a few hyperparameters on the horseshoe and you might have luck adjusting them for your application (paper here: https://arxiv.org/abs/1707.01694).

It’s also common to have to turn up adapt_delta for horseshoes, cause the posteriors can be a bit tricky.

1 Like

Did you specify the parameter alpha = 1 within caret? From what I can see caret sets its own default, alpha = 0.5, different than what glmnet sets as a default, alpha = 1.0. So caret is fitting a half lasso and half ridge penalty by default, without specifying alpha.

What sort of prior are you putting on lambda in your Stan code? How does the posterior mean of lambda compare with the 10-fold cross validation estimate of lambda you found with caret?

1 Like

@roualdes Thank you for the reply. Yeah I specify alpha=1 in caret. This my prior specification in the stan code.

model {

beta_j ~ double_exponential(0,1/lambda);
lambda ~ cauchy(0,1);
alpha ~ normal(0, 5);
y1 ~ bernoulli_logit_glm(x1, alpha, beta_j);

}

The lambda based on 10-fold cross validation using caret package is around 0.01 where as the posterior mean for the lambda is around 20. 87. I think we can compare the lambda after taking the reciprocal(1/20.87 = 0.0479).

@roualdes
I ran the analsyis again. Now the results from the caret package looks like this:

image
and These are from the Bayesian Lasso :

Now results are more comparable in terms of the coefficients. lambda from caret is 0.023. Based on the confidence intervals from Bayesian model, more coefficients seems to be insignificant compared to the results from the caret package.

2 Likes

Not quite sure about this one, but isn’t the posterior mode supposed to correspond to the penalized maximum likelihood estimate and here you are looking at the mean?

2 Likes

@daniel_h Yeah. Do you know how to get the mode estimators from the stan output ? I could extract only the mean , standard deviation and the quantiles.

You get penalized MLE with optimize method.

As @daniel_h suggested, posterior mode will be comparable with the classical LASSO estimates.
I plotted the densities of the posteriors and then the mode would be the maximum/peak of the posterior density.
Here are the plots,


The peaks of the posterior densities are comparable with the LASSO estimates from the caret package, except for the seventh beta coefficient.

These are the numerical values for the mode:

1 Like