I am trying to reproduce the example of using the horseshoe prior described in https://www.ariddell.org/horseshoe-prior-with-stan.html
but am running into some problems. For example, in the code below (rstan + Stan code) I get thousands of divergent transitions, very low n_eff, and hence poor results.
- How can I improve the behavior of this model?
- Are there variations on this model I should consider? (For example, Vehatari suggests some.)
I am using rstan_2.14.1
and R version 3.3.1
.
R code
# See http://andrewgelman.com/2015/02/17/bayesian-survival-analysis-horseshoe-priors/
# Here I follow https://www.ariddell.org/horseshoe-prior-with-stan.html which is linked to above
# and is in Python.
library(lars)
data('diabetes')
# Divide diabetes dataset into a training set and a disjoint test
# set of szie 100
set.seed(1234)
test.ixs <- sample(nrow(diabetes), 100)
test.data <- diabetes[test.ixs,]
train.data <- diabetes[-test.ixs,]
# OLS ###
ols.fit <- lm(y ~ x2, data = train.data)
coef(ols.fit)
mean((predict(ols.fit, newdata = test.data) - test.data$y)^2)
# LASSO ####
lasso.fit <- with(train.data, glmnet::cv.glmnet(x2, y))
coef(lasso.fit) # Same as s = "lambda.1se"
mean((predict(lasso.fit, test.data$x2) - test.data$y)^2)
# Horeshoe (Stan) ####
library(rstan)
# This seems to produce poor results; for example, low n_eff (sometimes below 5)
# and (not surprisingly) significant variation in the mean beta from run to run.
# seed=123 => beta[3]: mean=0.96, n_eff=21
# seed=124 => beta[3]: mean=202.17, n_eff=2
stan.fit <- stan(file = 'horseshoe.stan',
data = list(n = length(train.data$y),
y = train.data$y,
p = ncol(train.data$x2),
X = train.data$x2),
seed = 123)
print(stan.fit, pars = c('tau', 'sigma', 'beta'))
Stan code
data {
int<lower=0> n;
int<lower=0> p;
matrix[n,p] X;
vector[n] y;
}
parameters {
vector[p] beta;
vector<lower=0>[p] lambda;
real<lower=0> tau;
real<lower=0> sigma;
}
model {
lambda ~ cauchy(0, 1);
tau ~ cauchy(0, 1);
for (i in 1:p)
beta[i] ~ normal(0, lambda[i] * tau);
y ~ normal(X * beta, sigma);
}