Different results using classical LASSO and Bayesian LASSO

dhanur88 · October 14, 2020, 3:27am

Hi everyone,

This question is more or less a theoretical question. So I am not sure whether it is suitable to post this here. Anyway it is great if I can get the opinion of the subject experts regarding this matter.

I have fitted a LASSO logistic regression model using classical approach and Bayesian approach and the results are different. I am curious about this because, the Bayesian estimates using double exponential priors similar to the classical LASSO estimates.

Here are the results based on caret package (which internally use glmnet package) in R. I used 10 fold cross validation to find the optimal lambda.

Here are the results based on Bayesian lasso by taking double exponential priors using my own code.

In fact, there seems to be no bug in my code as I obtained similar results based on brms package.

I also tried the horseshoe prior in brms package. But I got divergent transitions.

So is it possible to explain the difference in results ?

Thank you in advance.

bbbales2 · October 15, 2020, 12:17pm

I don’t know the details, but here’s a blog post describing how the Bayesian Lasso doesn’t really work: https://statmodeling.stat.columbia.edu/2017/11/02/king-must-die/

It might give you some leads on where to look. I’m not sure how the argument goes myself.

I think give this another swing. There are a few hyperparameters on the horseshoe and you might have luck adjusting them for your application (paper here: [1707.01694] Sparsity information and regularization in the horseshoe and other shrinkage priors).

It’s also common to have to turn up adapt_delta for horseshoes, cause the posteriors can be a bit tricky.

roualdes · October 15, 2020, 5:35pm

Did you specify the parameter alpha = 1 within caret? From what I can see caret sets its own default, alpha = 0.5, different than what glmnet sets as a default, alpha = 1.0. So caret is fitting a half lasso and half ridge penalty by default, without specifying alpha.

What sort of prior are you putting on lambda in your Stan code? How does the posterior mean of lambda compare with the 10-fold cross validation estimate of lambda you found with caret?

dhanur88 · October 15, 2020, 6:33pm

@roualdes Thank you for the reply. Yeah I specify alpha=1 in caret. This my prior specification in the stan code.

model {

beta_j ~ double_exponential(0,1/lambda);
lambda ~ cauchy(0,1);
alpha ~ normal(0, 5);
y1 ~ bernoulli_logit_glm(x1, alpha, beta_j);

}

The lambda based on 10-fold cross validation using caret package is around 0.01 where as the posterior mean for the lambda is around 20. 87. I think we can compare the lambda after taking the reciprocal(1/20.87 = 0.0479).

dhanur88 · October 15, 2020, 6:45pm

@roualdes
I ran the analsyis again. Now the results from the caret package looks like this:

and These are from the Bayesian Lasso :

Now results are more comparable in terms of the coefficients. lambda from caret is 0.023. Based on the confidence intervals from Bayesian model, more coefficients seems to be insignificant compared to the results from the caret package.

daniel_h · October 16, 2020, 1:37am

Not quite sure about this one, but isn’t the posterior mode supposed to correspond to the penalized maximum likelihood estimate and here you are looking at the mean?

dhanur88 · October 16, 2020, 1:54am

@daniel_h Yeah. Do you know how to get the mode estimators from the stan output ? I could extract only the mean , standard deviation and the quantiles.

ahartikainen · October 16, 2020, 4:17am

You get penalized MLE with optimize method.

dhanur88 · October 16, 2020, 5:34am

As @daniel_h suggested, posterior mode will be comparable with the classical LASSO estimates.
I plotted the densities of the posteriors and then the mode would be the maximum/peak of the posterior density.
Here are the plots,

The peaks of the posterior densities are comparable with the LASSO estimates from the caret package, except for the seventh beta coefficient.

These are the numerical values for the mode:

Topic		Replies	Views
"LASSO" trace plots in Bayesian framework General	3	1469	March 1, 2022
Horseshoe prior with diabetes dataset Modeling fitting-issues	4	3357	May 12, 2017
bayes_R2, loo_R2 and hs() priors General	8	745	October 31, 2020
Getting different results for logistic regression model Modeling rstan , techniques , specification , loo	6	836	November 13, 2020
Brms for ordinal outcome (Ridge, LASSO, Elastic Net) Modeling techniques	2	1630	February 25, 2023

Different results using classical LASSO and Bayesian LASSO

Related topics