Horseshoe prior with diabetes dataset

I am trying to reproduce the example of using the horseshoe prior described in but am running into some problems. For example, in the code below (rstan + Stan code) I get thousands of divergent transitions, very low n_eff, and hence poor results.

  1. How can I improve the behavior of this model?
  2. Are there variations on this model I should consider? (For example, Vehatari suggests some.)

I am using rstan_2.14.1 and R version 3.3.1.

R code

# See
# Here I follow which is linked to above
# and is in Python.



# Divide diabetes dataset into a training set and a disjoint test
# set of szie 100
test.ixs <- sample(nrow(diabetes), 100) <- diabetes[test.ixs,] <- diabetes[-test.ixs,]

# OLS ### <- lm(y ~ x2, data =
mean((predict(, newdata = -$y)^2)

# LASSO #### <- with(, glmnet::cv.glmnet(x2, y))
coef( # Same as s = "lambda.1se"
mean((predict(,$x2) -$y)^2)

# Horeshoe (Stan) ####

# This seems to produce poor results; for example, low n_eff (sometimes below 5)
# and (not surprisingly) significant variation in the mean beta from run to run.
# seed=123 => beta[3]: mean=0.96, n_eff=21
# seed=124 => beta[3]: mean=202.17, n_eff=2 <- stan(file = 'horseshoe.stan', 
                 data = list(n = length($y), 
                             y =$y,
                             p = ncol($x2), 
                             X =$x2),
                 seed = 123)

print(, pars = c('tau', 'sigma', 'beta'))

Stan code

data {
  int<lower=0> n;
  int<lower=0> p;
  matrix[n,p] X;
  vector[n] y;
parameters {
  vector[p] beta;
  vector<lower=0>[p] lambda;
  real<lower=0> tau;
  real<lower=0> sigma;
model {
  lambda ~ cauchy(0, 1);
  tau ~ cauchy(0, 1);
  for (i in 1:p)
    beta[i] ~ normal(0, lambda[i] * tau);
  y ~ normal(X * beta, sigma);
1 Like

It is pretty much impossible to get hierarchical priors to work well unless you use reparameterizations and increase adapt_delta to be close to 1. Use this Stan code as a starting point

1 Like

Easier link [1610.05559] On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior


Thank you – reparameterizing (along the lines of code at the end of the linked paper) allowed the model to converge well.

I also experimented with setting a tau_0 (as described in the paper) and using a normal (as opposed to Cauchy) prior on tau, but those had relatively little effect on this model and data.

I wonder whether it might be worth including this in the set of Stan examples?

Yes, we are going to add this to Stan examples.