How to sample with the 1-norm?

FelixP · March 14, 2024, 9:30pm

Dear All,

I am currently working on ridge regression, which can be interpreted using Bayesian statistics (DOI: 10.1016/j.electacta.2015.03.123). In particular, I know that the maximum-a-posteriori (MAP) estimate can be computed with its uncertainty. However, is it possible to evaluate the MAP with Stan when the 1-norm is used instead of the 2-norm, i.e., for lasso regression? Moreover, how do you sample with the 1-norm?

Cheers!

Garren_Hermanus · March 14, 2024, 9:44pm

If you use a Laplace (double exponential) prior you can obtain a Bayesian equivalent framework to Lasso regression. The one dimensional Laplace distribution is specified by a mean, \mu, and a scale, \lambda such that the probability density function is given by:

Laplace(x|\mu, \lambda) = 1/(2\lambda) exp(-|x-\mu|/\lambda)

In stan you can use the sampling statement for the laplace prior:
x~double_exponential(mu, lambda)

FelixP · March 15, 2024, 8:08am

Thank you so much for your reply @Garren_Hermanus! Could you please also provide more details about the code, especially the selection of the parameters of the Laplace distribution?
Cheers

Garren_Hermanus · March 15, 2024, 12:19pm

It seems like you just need more background regarding Bayesian modelling. I would recommend you read through “Pattern Recognition and Machine Learning” by Bishop. To induce sparsity we want to regularize the parameters towards zero, thus we chose \mu=0. For \lambda you can follow 1 of 2 approaches.

Chose fixed values for \lambda and train your model based on these fixed values. To evaluate which of these is best suited you would need to evaluate the evidence term and use Bayes’ factors to determine which setting of \lambda is best suited for the model.
Place an additional prior on \lambda. The choice of this depends on the users. You may chose to use a uninformative prior in terms of the uniform distribution P(\lambda)=1/\lambda_{max} where \lambda \in [0, \lambda_{max}] or one the exponential distribution such as the Gamma or inverse gamma distributions. As a starting point I would use the uniform prior on \lambda and train the model based on this. That is we will have the full bayesian model:

P(x,\lambda | \mathcal{D}) \propto P(\mathcal{D}|x)P(x|\lambda)P(\lambda)

Where \mathcal{D} is the data. The above model will give you a distribution across \lambda and x.

Suppose we have the model y=f(x)+\epsilon where f(x)=X\beta. We would have the model:

P(y|\beta)=\mathcal{N}(y|X\beta,\sigma)

P(\beta|\lambda) = Laplace(\beta|0,\lambda)

P(\lambda)=1/\lambda_{max}

Sample stan code for model is

model {
   lambda ~ uniform(0,lambda_max); // not really needed if lambda defined as real<lower=0, upper=lambda_max>
   beta ~ double_exponential(0,lambda);
   y ~ normal(X*beta, sigma); 
}

Consult the stan documentation for further details regarding writing stan code

Topic		Replies	Views
Model Validation - Linear Regression with X and Y Uncertainty Modeling specification	4	1255	March 5, 2024
How many warmup, iter, and thin are recommended to use in high-dimensional problem? Modeling	5	1352	April 27, 2020
Ineffective sampling in linear regression when errors approach zero Modeling fitting-issues , specification	11	1034	June 5, 2020
"LASSO" trace plots in Bayesian framework General	3	1435	March 1, 2022
Beta regression using LASSO prior (Park, Casella) - Initial values rejected Modeling fitting-issues , specification	6	1358	July 18, 2019

How to sample with the 1-norm?

Related topics