Divergent transitions with the regularized horseshoe

Hi all,

I’m working on the regularized horseshoe and it is throwing about 30 divergent transitions. The effective sample sizes and rhats for all parameters are fine. The regression has 10 predictors and n is 4000. I realize this is not the optimal situation to demonstrate the regularized horseshoe, but I am still getting a tiny bit of additional shrinkage in the estimates compared to the horseshoe. The code is as follows.

modelString = "
data {
  int <lower=1> n;          // number of observations
  int <lower=1> p;          // number of predictors
  real readscore[n];        // outcome
  matrix[n,p] X;            // inputs


transformed data {
  real p0 = 5;
 // real slab_df = 4;
 //  real half_slab_df = 0.5 * slab_df;  

parameters {
  vector[p] beta;
  vector<lower=0>[p] lambda;
  real<lower=0> c2;
  real<lower=0> tau;
  real alpha;
  real<lower=0> sigma;


transformed parameters {
  real tau0 = (p0 / (p - p0)) * (sigma / sqrt(1.0 * n));
  vector[p] lambda_tilde =
     sqrt(c2)  * lambda ./ sqrt(c2 + square(tau) * square(lambda));
model {
  beta ~ normal(0, tau * lambda_tilde);
  lambda ~ cauchy(0, 1);
  c2 ~ inv_gamma(2, 8); 
  tau ~ cauchy(0, tau0);
  alpha ~ normal(0, 2);
  sigma ~ cauchy(0, 1);
  readscore ~ normal(X * beta + alpha, sigma);


// For posterior predictive checking and loo cross-validation
generated quantities {
  vector[n] readscore_rep;
  vector[n] log_lik;
  for (i in 1:n) {
  readscore_rep[i] = normal_rng(alpha + X[i,:] * beta, sigma);
  log_lik[i] = normal_lpdf(readscore[i] | alpha + X[i,:] * beta, sigma);


The code is a slight modification of that given in Betancourt (2018) where instead of specifying the slab scale and slab df, I’m giving c^2 an inverse-gamma (2,8) prior as per a suggestion in Piironen and Vehtari (2017). I have tried the Betancourt code directly and again, the rhats and n_eff values look fine, but it is throwing around the same number of divergent transition warnings. Any thoughts?

Thanks in advance,


1 Like

It’s pretty hard to not get a terrible posterior geometry with the horseshoe model. For extensive discussion into why, including experiments and possible alternatives, see Sparsity Blues.