one of my long term projects is the improvement over a regression implementation based on support vector regression (svm R function that uses ridge + E-insensitive regression).

The last scenario where I cannot outperform is where there is a big (500 x 25) design matrix having multicollinearity.

The function stan_lm implements ridge-like regression giving prior on R2. This should (as my understanding) penalize co-linear predictors.

Now either:

add this ridge regression to my model, or

start from the fast stan_lm model (that I assume is much more complicated that I need) to add the following features

– simplex for beta vector
– dirichlet hyperprior on beta vectors parameters from many regressions
– possibly mixed regression with background biased poor predictors to not let them affect the inference (they are in a non symmetric random shape around regression line)

I just want to realize if some feature I need can be omitted for the benefit of ridge regression.

For example:

this stan-BUGS example is about ridge regression, but does not imply beta prior line in stan_lm, furthermore I don’t see any QR decomposition, while it seems the priors should be applied on R component. (I’m a but confused, actually see just a student_t regression, nothing about ridge)

And this independent ridge regresison implementation is another example

It is a topic I don;t know much about but might be what I have been needed for long time

The lm.stan file in rstanarm is very specific to a Gaussian model with no link function, no hierarchal structure, etc. If you want to do 2), then you need to start from the BUGS example.

can I ask, is the beta prior because of the specificity of that model, or make sense for all ridge models (because it does not appear for other ridge regressions).

The model stan_lm perform quite well “out-of-the-box” compared with the gold standard (ridge + E-insensitive regression).

This is how badly co-linear are some of my predictors (they are part of a “gold standard” matrix so I have to out-perform the competitor with their own matrix first)

Yes, it comes from assuming the correlation matrix of X,y is distributed LKJ, in which case the bottom right corner of its Cholesky factor squared is the proportion of error and distributed beta.

The stan_lm function is the gold standard for what we are trying to accomplish with the rstanarm package. So, lasso probably won’t work as well, but you can do lasso and a bunch of other priors using stan_glm with family = gaussian() and typically QR = TRUE.

I miss the link between your correlation prior and the ridge models I posted above. They seem to not have much in common.

Is perhaps only this

beta ~ normal(0, sqrt(phi));

that penalises colinear predictors? I would have expected also a correlation matrix to apply a prior to.

Otherwise, would you mind pointing the connection to me in the way I can start upgrading a simple gaussian regression model with ridge and see how it performs compared to stan_lm?

Also could you point me to the best implementation of ridge regression in stan in your opinion, that is less locked into a specific situation and is more adaptable to expansion?

I obtained the above result inputting data in logscale (an then normalizing) by mistake. The result on normalized natural scale is identical to lsfit function of R

The strange thing to me is that the ridge formulation doesn’t seem to have any effect even though there is really high correlation among predictors (see post above).

Thank you for that link! I am new to using stan. The quote

However, most researchers have little inclination to specify all these prior distributions thoughtfully and take a short-cut by specifying one prior distribution that is taken to apply to all the regression coefficients as if they were independent of each other (and the intercept and error variance).

Describes me well. Is there information / examples of implementing this prior directly in stan, for use in pystan?