one of my long term projects is the improvement over a regression implementation based on support vector regression (svm R function that uses ridge + E-insensitive regression).
The last scenario where I cannot outperform is where there is a big (500 x 25) design matrix having multicollinearity.
The function stan_lm implements ridge-like regression giving prior on R2. This should (as my understanding) penalize co-linear predictors.
add this ridge regression to my model, or
start from the fast stan_lm model (that I assume is much more complicated that I need) to add the following features
– simplex for beta vector
– dirichlet hyperprior on beta vectors parameters from many regressions
– possibly mixed regression with background biased poor predictors to not let them affect the inference (they are in a non symmetric random shape around regression line)
I just want to realize if some feature I need can be omitted for the benefit of ridge regression.
this stan-BUGS example is about ridge regression, but does not imply beta prior line in stan_lm, furthermore I don’t see any QR decomposition, while it seems the priors should be applied on R component. (I’m a but confused, actually see just a student_t regression, nothing about ridge)
And this independent ridge regresison implementation is another example
It is a topic I don;t know much about but might be what I have been needed for long time
Yes, it comes from assuming the correlation matrix of X,y is distributed LKJ, in which case the bottom right corner of its Cholesky factor squared is the proportion of error and distributed beta.
The stan_lm function is the gold standard for what we are trying to accomplish with the rstanarm package. So, lasso probably won’t work as well, but you can do lasso and a bunch of other priors using stan_glm with family = gaussian() and typically QR = TRUE.
Thank you for that link! I am new to using stan. The quote
However, most researchers have little inclination to specify all these prior distributions thoughtfully and take a short-cut by specifying one prior distribution that is taken to apply to all the regression coefficients as if they were independent of each other (and the intercept and error variance).
Describes me well. Is there information / examples of implementing this prior directly in stan, for use in pystan?