Horseshoe prior with diabetes dataset

I am trying to reproduce the example of using the horseshoe prior described in https://www.ariddell.org/horseshoe-prior-with-stan.html but am running into some problems. For example, in the code below (rstan + Stan code) I get thousands of divergent transitions, very low n_eff, and hence poor results.

  1. How can I improve the behavior of this model?
  2. Are there variations on this model I should consider? (For example, Vehatari suggests some.)

I am using rstan_2.14.1 and R version 3.3.1.

R code

# See http://andrewgelman.com/2015/02/17/bayesian-survival-analysis-horseshoe-priors/
# Here I follow https://www.ariddell.org/horseshoe-prior-with-stan.html which is linked to above
# and is in Python.

library(lars)

data('diabetes')

# Divide diabetes dataset into a training set and a disjoint test
# set of szie 100
set.seed(1234)
test.ixs <- sample(nrow(diabetes), 100)
test.data <- diabetes[test.ixs,]
train.data <- diabetes[-test.ixs,]

# OLS ###
ols.fit <- lm(y ~ x2, data = train.data)
coef(ols.fit)
mean((predict(ols.fit, newdata = test.data) - test.data$y)^2)

# LASSO ####
lasso.fit <- with(train.data, glmnet::cv.glmnet(x2, y))
coef(lasso.fit) # Same as s = "lambda.1se"
mean((predict(lasso.fit, test.data$x2) - test.data$y)^2)

# Horeshoe (Stan) ####
library(rstan)

# This seems to produce poor results; for example, low n_eff (sometimes below 5)
# and (not surprisingly) significant variation in the mean beta from run to run.
# seed=123 => beta[3]: mean=0.96, n_eff=21
# seed=124 => beta[3]: mean=202.17, n_eff=2
stan.fit <- stan(file = 'horseshoe.stan', 
                 data = list(n = length(train.data$y), 
                             y = train.data$y,
                             p = ncol(train.data$x2), 
                             X = train.data$x2),
                 seed = 123)

print(stan.fit, pars = c('tau', 'sigma', 'beta'))

Stan code

data {
  int<lower=0> n;
  int<lower=0> p;
  matrix[n,p] X;
  vector[n] y;
}
parameters {
  vector[p] beta;
  vector<lower=0>[p] lambda;
  real<lower=0> tau;
  real<lower=0> sigma;
}
model {
  lambda ~ cauchy(0, 1);
  tau ~ cauchy(0, 1);
  for (i in 1:p)
    beta[i] ~ normal(0, lambda[i] * tau);
  y ~ normal(X * beta, sigma);
}
1 Like

It is pretty much impossible to get hierarchical priors to work well unless you use reparameterizations and increase adapt_delta to be close to 1. Use this Stan code as a starting point
https://arxiv.org/format/1610.05559

1 Like

Easier link [1610.05559] On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior

Aki

Thank you – reparameterizing (along the lines of code at the end of the linked paper) allowed the model to converge well.

I also experimented with setting a tau_0 (as described in the paper) and using a normal (as opposed to Cauchy) prior on tau, but those had relatively little effect on this model and data.

I wonder whether it might be worth including this in the set of Stan examples?

Yes, we are going to add this to Stan examples.

Aki