Hierarchical Linear Models - Bayes vs. Frequentist

There are a lot of things going on with such a comparison.

If you set up a normal prior with a fixed scale \sigma on the coefficients of a regression centered at zero, you get ridge regression. You can set up a hierarchical model to estimate \sigma along with the regression coefficients by giving it a hyperprior.

Either way, you can take maximum likelihood or max marginal likelihood estimates or you can take Bayesian posteriors. With a Bayesian posterior, you can get point estimates or perform full Bayesian inference.

Practitioners of ridge regression in machine learning would cross-validate on some metric of interest in order to choose the prior (aka penalty). If you’re going to point estimates in the end, this can produce a very similar result to fitting max marginal likelihood (sometimes called “empirical Bayes” in various forms).

Same goes for elastic nets (mixed L1 [Laplace] and L2 [normal]) or other priors.

I’m not sure how you’re assigning cost, but running a single hierarchical regression is usually faster than using cross-validation.

2 Likes