Horseshoe and regression coefficients with bounds

IvanUkhov · October 1, 2020, 5:58pm

I would like to use a horseshoe prior in a context where regression coefficients have a natural lower bound. To begin with, I run a prior predictive check, and, already there, I see many divergent transitions. This got me thinking that something might not be compatible. Is it legitimate to have constrained regression coefficients with a horseshoe prior? Thank you!

code <- '
data {
  int<lower = 0> n;
}

parameters {
  vector<lower = -1>[n] beta;
  vector<lower = 0>[n] lambda;
  real<lower = 0> tau;
}

model {
  beta ~ normal(0, lambda * tau);
  lambda ~ cauchy(0, 1);
  tau ~ cauchy(0, 1);
}
'

fit <- stan(model_code = code, data = list(n = 10), seed = 42)

avehtari · October 2, 2020, 1:10pm

Yes. Given the constraint, do you still assume most coefficients are close to 0?

Horseshoe and regularized horseshoe have (usually) difficult prior geometry, which will carry on to the posterior in case of weak likelihood, but the posterior can also be better behaving.

bbbales2 · October 2, 2020, 1:34pm

I don’t know if the parameter constraint is legit or not in the sense that you keep all the horseshoe properties but now you have the bound.

However that will keep you from doing the non-centered parameterization of beta, which will probably give you a bunch of divergences.

I’d just keep beta unconstrained and pass it through a link function.

So if your unconstrained regression was:

y ~ normal(beta * X, sigma);

do now:

vector[n] cbeta = exp(beta) + -1;
y ~ normal(cbeta * X, sigma);

so cbeta is constrained > -1.

andrjohns · October 2, 2020, 2:03pm

It’s pretty neat, you can still mix the constraint with a non-centered param but the bound is a little different - Bob covered over in this thread how to work out the new bound: Non-centered parameterisation with boundaries

And a practical example for lower=0:

data {
    int<lower=0> J; 
    real y[J];                                 
    real<lower=0> sigma[J];                    
}
parameters {
    real mu;          
    real<lower=0> tau;                        
    real<lower=-mu/tau> theta_raw[J]; //Implies lower=0 on theta
}
transformed parameters {
  theta[J] = mu + theta_raw * tau;
}
model {
    theta_raw ~ std_normal();
    y ~ normal(theta, sigma);
}

IvanUkhov · October 2, 2020, 7:13pm

Given the constraint, do you still assume most coefficients are close to 0?

Thank you for the answer. Yes, this was the intention, and now I have realized that I probably should compensate for the chopped-off tail, as the mean has shifted to the right. Then the mean of the truncated Gaussian should be set to the solution of \mu + \frac{\phi(-1 - \mu)}{1 - \Phi(-1 - \mu)} = 0 with \phi being the PDF of the standard Gaussian, and \Phi its CDF (if I interpret Wikipedia correctly).

Or what is your spontaneous reaction? What did you try to suggest with your question?

avehtari · October 2, 2020, 7:38pm

Just to clarify whether 0 is still special or maybe that boundary is the special one? Towards which value you want to have the shrinkage?

IvanUkhov · October 3, 2020, 6:37am

I would like to shrink toward 0. Coefficients are allowed to be negative but no less than –1, and sparsity is assumed. Would you use a horseshoe prior in this setting? If so, is the aforementioned compensation to have a mean of 0 sensible? (Numerical optimization says that \mu = -0.52 in this case.)

IvanUkhov · October 3, 2020, 9:54am

Thank you for the answer! Let me check if I understood correctly. Here the regression coefficient is theta, and it is implicitly constrained to be nonnegative. theta_raw is also constrained but with a bound dependent on mu and tau. Horseshoe sparsity can be enforced by adding lambda to the varying bound and to the expression for theta and giving both lambda and tau Cauchy priors. The benefit of doing all this is that there is no prior entangling mu, lambda, and tau. Instead, they are combined via a transformation of independent variables, which should simplify sampling. Is this a fair summary?

avehtari · October 4, 2020, 10:22am

I would consider using regularized horseshoe, but it’s not the only sparsity prior, so additional information might change my mind to use something else.

You still want to shrink towards 0, so I don’t see need for this compensation.

Topic		Replies	Views
How to make the regularized horseshoe less strict? Modeling	2	645	October 9, 2020
Linear regression with horseshoe prior, not seeing the shrink rstanarm	9	2410	February 8, 2018
Horseshoe prior on distributional parameters brms	1	692	March 31, 2019
How to make regularized horse shoe prior asymmetric around mean? Modeling	3	444	July 30, 2020
Getting bad results with horseshoe prior Modeling fitting-issues	15	2238	May 1, 2018

Horseshoe and regression coefficients with bounds

Related topics