Stan_lm: use it as base for expansion. Is it possible?

stemangiola · June 26, 2017, 1:27am

Hello,

one of my long term projects is the improvement over a regression implementation based on support vector regression (svm R function that uses ridge + E-insensitive regression).

The last scenario where I cannot outperform is where there is a big (500 x 25) design matrix having multicollinearity.

The function stan_lm implements ridge-like regression giving prior on R2. This should (as my understanding) penalize co-linear predictors.

Now either:

add this ridge regression to my model, or
start from the fast stan_lm model (that I assume is much more complicated that I need) to add the following features

– simplex for beta vector
– dirichlet hyperprior on beta vectors parameters from many regressions
– possibly mixed regression with background biased poor predictors to not let them affect the inference (they are in a non symmetric random shape around regression line)

What would be the right approach?

Thanks a lot.

bgoodri · June 26, 2017, 3:13am

The second option would not mesh with the parameterization tricks in lm.stan.

stemangiola · June 26, 2017, 3:14am

Thanks.

Could you please be more specific?

I just want to realize if some feature I need can be omitted for the benefit of ridge regression.

For example:

this stan-BUGS example is about ridge regression, but does not imply beta prior line in stan_lm, furthermore I don’t see any QR decomposition, while it seems the priors should be applied on R component. (I’m a but confused, actually see just a student_t regression, nothing about ridge)

github.com

stan-dev/example-models/blob/master/bugs_examples/vol1/stacks/stacks_f_t4_ridge.stan

# Stacks: robust regression and ridge regression 
# http://mathstat.helsinki.fi/openbugs/Examples/Stacks.html
# Model f) t error ridge regression

data {
  int<lower=0> N;
  int<lower=0> p;
  real Y[N];
  matrix[N,p] x;
} 

// to standardize the x's 
transformed data {
  matrix[N,p] z;
  row_vector[p] mean_x;
  real sd_x[p];
  real d;

  for (j in 1:p) { 
    mean_x[j] <- mean(col(x,j));

This file has been truncated. show original

And this independent ridge regresison implementation is another example

gist.github.com

https://gist.github.com/cheuerde/403c56c1ec89f693d0f0

rr_stan.r

# Claas Heuer, 2015

library(rstan)

hcode = 
"

data { 

  int<lower=0> Px;

This file has been truncated. show original

It is a topic I don;t know much about but might be what I have been needed for long time

Thanks a lot

bgoodri · June 26, 2017, 4:18am

The lm.stan file in rstanarm is very specific to a Gaussian model with no link function, no hierarchal structure, etc. If you want to do 2), then you need to start from the BUGS example.

stemangiola · June 26, 2017, 7:42am

I see,

can I ask, is the beta prior because of the specificity of that model, or make sense for all ridge models (because it does not appear for other ridge regressions).

The model stan_lm perform quite well “out-of-the-box” compared with the gold standard (ridge + E-insensitive regression).

This is how badly co-linear are some of my predictors (they are part of a “gold standard” matrix so I have to out-perform the competitor with their own matrix first)

(I wonder if a lasso regression would work even better)

bgoodri · June 26, 2017, 1:51pm

Yes, it comes from assuming the correlation matrix of X,y is distributed LKJ, in which case the bottom right corner of its Cholesky factor squared is the proportion of error and distributed beta.

The stan_lm function is the gold standard for what we are trying to accomplish with the rstanarm package. So, lasso probably won’t work as well, but you can do lasso and a bunch of other priors using stan_glm with family = gaussian() and typically QR = TRUE.

stemangiola · June 26, 2017, 11:33pm

Thanks,

I miss the link between your correlation prior and the ridge models I posted above. They seem to not have much in common.

Is perhaps only this

beta ~ normal(0, sqrt(phi));

that penalises colinear predictors? I would have expected also a correlation matrix to apply a prior to.

Otherwise, would you mind pointing the connection to me in the way I can start upgrading a simple gaussian regression model with ridge and see how it performs compared to stan_lm?

Also could you point me to the best implementation of ridge regression in stan in your opinion, that is less locked into a specific situation and is more adaptable to expansion?

Thanks a lot

stemangiola · June 27, 2017, 2:28am

Actually @bgoodri ,

I obtained the above result inputting data in logscale (an then normalizing) by mistake. The result on normalized natural scale is identical to lsfit function of R

 stan_lm (ridge)          Epsilon-inv + ridge       lsfit (R function)           
                             (svm,linear)

The strange thing to me is that the ridge formulation doesn’t seem to have any effect even though there is really high correlation among predictors (see post above).

bgoodri · June 29, 2017, 12:19am

This is mostly explained in the vignette.

stemangiola · June 29, 2017, 12:21am

Thanks I will read it more carefully.

Eh2406 · June 29, 2017, 2:54pm

Thank you for that link! I am new to using stan. The quote

However, most researchers have little inclination to specify all these prior distributions thoughtfully and take a short-cut by specifying one prior distribution that is taken to apply to all the regression coefficients as if they were independent of each other (and the intercept and error variance).

Describes me well. Is there information / examples of implementing this prior directly in stan, for use in pystan?

bgoodri · June 29, 2017, 5:23pm

You would pretty much have to copy

github.com

stan-dev/rstanarm/blob/master/exec/lm.stan

#include "Columbia_copyright.stan"
#include "license.stan" // GPL3+

// GLM for a Gaussian outcome with no link function
functions {
  /**
   * Increments the log-posterior with the logarithm of a multivariate normal 
   * likelihood with a scalar standard deviation for all errors
   * Equivalent to normal_lpdf(y | intercept + Q * R * beta, sigma) but faster
   * @param theta vector of coefficients (excluding intercept), equal to R * beta
   * @param b precomputed vector of OLS coefficients (excluding intercept) in Q-space
   * @param intercept scalar (assuming columns of Q have mean zero)
   * @param ybar precomputed sample mean of the outcome
   * @param SSR positive precomputed value of the sum of squared OLS residuals
   * @param sigma positive scalar for the standard deviation of the errors
   * @param N integer equal to the number of observations
   */
  real ll_mvn_ols_qr_lp(vector theta, vector b,
                        real intercept, real ybar,
                        real SSR, real sigma, int N) {

This file has been truncated. show original

Topic		Replies	Views
Sampler getting stuck? Modeling	18	3765	May 26, 2019
Stan_betareg intercept coefficient unaffected by prior rstanarm	15	2048	August 26, 2017
Prior on R2 rstanarm	7	3879	September 18, 2019
My first Stan model - hierarchical logistic regression Modeling cognitive-science	28	8422	June 6, 2017
Prior specification for alternative explanations Modeling techniques	4	898	May 24, 2017

Stan_lm: use it as base for expansion. Is it possible?

Related topics