I would like to use a horseshoe prior in a context where regression coefficients have a natural lower bound. To begin with, I run a prior predictive check, and, already there, I see many divergent transitions. This got me thinking that something might not be compatible. Is it legitimate to have constrained regression coefficients with a horseshoe prior? Thank you!
Yes. Given the constraint, do you still assume most coefficients are close to 0?
Horseshoe and regularized horseshoe have (usually) difficult prior geometry, which will carry on to the posterior in case of weak likelihood, but the posterior can also be better behaving.
It’s pretty neat, you can still mix the constraint with a non-centered param but the bound is a little different - Bob covered over in this thread how to work out the new bound: Non-centered parameterisation with boundaries
And a practical example for lower=0:
data {
int<lower=0> J;
real y[J];
real<lower=0> sigma[J];
}
parameters {
real mu;
real<lower=0> tau;
real<lower=-mu/tau> theta_raw[J]; //Implies lower=0 on theta
}
transformed parameters {
theta[J] = mu + theta_raw * tau;
}
model {
theta_raw ~ std_normal();
y ~ normal(theta, sigma);
}
Given the constraint, do you still assume most coefficients are close to 0?
Thank you for the answer. Yes, this was the intention, and now I have realized that I probably should compensate for the chopped-off tail, as the mean has shifted to the right. Then the mean of the truncated Gaussian should be set to the solution of \mu + \frac{\phi(-1 - \mu)}{1 - \Phi(-1 - \mu)} = 0 with \phi being the PDF of the standard Gaussian, and \Phi its CDF (if I interpret Wikipedia correctly).
Or what is your spontaneous reaction? What did you try to suggest with your question?
I would like to shrink toward 0. Coefficients are allowed to be negative but no less than –1, and sparsity is assumed. Would you use a horseshoe prior in this setting? If so, is the aforementioned compensation to have a mean of 0 sensible? (Numerical optimization says that \mu = -0.52 in this case.)
Thank you for the answer! Let me check if I understood correctly. Here the regression coefficient is theta, and it is implicitly constrained to be nonnegative. theta_raw is also constrained but with a bound dependent on mu and tau. Horseshoe sparsity can be enforced by adding lambda to the varying bound and to the expression for theta and giving both lambda and tau Cauchy priors. The benefit of doing all this is that there is no prior entangling mu, lambda, and tau. Instead, they are combined via a transformation of independent variables, which should simplify sampling. Is this a fair summary?
I would consider using regularized horseshoe, but it’s not the only sparsity prior, so additional information might change my mind to use something else.
You still want to shrink towards 0, so I don’t see need for this compensation.