I have just started exploring Stan and it’s a really cool tool to work with. Now I’m trying to perform ridge regression in Stan. There’s no problem doing so until I have to work with constraints on my regression coefficients. Just as the glmnet function in R can find both lambda and regression coefficients for a model with constraints on the range that the betas can take (using the lower.limits and upper.limits arguments), can <lower=a, upper=b>
in Stan manage to mimic this feature of glmnet? Is that enough?
Welcome to the Stan forum, I see this is your first post.
If you need to force the regression coefficients to be between a
and b
then declaring them with <lower=a, upper=b>
will do that. So in that sense it should do what you’re looking for.
That said, we often recommend that you only use strict bounds when it’s strictly necessary (e.g. for a standard deviation use <lower=0>
, for a correlation use <lower=-1,upper=1>
, etc.). Using hard constraints when they’re not required can sometimes lead to sampling difficulties due to posterior mass piling up at the boundaries if values outside the boundaries are consistent with the data/model. If your goal is merely to discourage large coefficients it is usually better to use only a regularizing prior rather than strict constraints. If the posterior draws don’t end up concentrating right at the boundaries then this is less of an issue.
Thanks for your reply.
The image I have attached is the head of the data set I am working with. Here, we know that variable X1 has a negative effect on Y while X3 will have a positive impact. We cannot accept any opposite results and hence we are giving constraints for the coefficients. So, if I give strict constraints to the regression coefficients individually, will that be okay? I have attached my stan code for your reference.
parameters{
real<lower = 0> sigma;
real alpha_std;
real<lower = 0> beta_X3;
real<upper = 0> beta_X1;
vector[3] beta;
real<lower = 0> reg;
}
model{
beta_sd~cauchy(0,1);
sigma~cauchy(0,2);
alpha_std~normal(0,5);
beta_X3~normal(0,beta_sd);
beta_X1~normal(0,beta_sd);
beta~normal(0,beta_sd);
std_y~normal(alpha_std+std_X1*beta_X1+std_X3*beta_X3+std_X*beta,sigma);
}
When you say “we cannot accept any opposite results” do you mean that it would be inconvenient or that you know these constraints should definitely be true? If you really know that one is positive and one is negative then, yes, these constraints make sense:
real<lower = 0> beta_X3;
real<upper = 0> beta_X1;
If you don’t know for sure but just really strongly believe that one is positive and the other is negative then it could make sense to put a strong prior that almost certainly will give you what you expect but puts some non-trivial amount of prior probability mass on results that disagree with your expectations. That way if the data + model really indicates the opposite of what you expect then you can find that out and that may lead to questioning your assumptions and modeling choices in a good way.