Divergent transitions with the horseshoe prior

I am trying to use a horseshoe prior using the parametrization from
Peltola et al. (2014) and Piironen and Vehtari (2017). The situation I am looking at is a randomized clinical trial comparing a treatment with a placebo. There is a pre-treatment (baseline) measurement of the continuous outcome variable and the outcome is the change from baseline in this variable. There are 4 subgroups of patients that might potentially differ in how well they respond to treatment.

The model I want to fit has the following features:

  1. there may be an overall effect of the treatment, I perhaps want to be very mildly skeptical (weakly informative prior centered on no-effect)
  2. primarily the big question is whether the treatment somehow works a lot better in one subgroup, on the treatment by subgroup interaction is where I want to use a horsehoe prior to reflec this
  3. a random effect on the intercept (change from baseline might differ a bit between subgroups, but they are probably somewhat similar - helps with the fact that we have very few patients in each subgroup)
  4. the baseline value has the same effect on the change from baseline in all patients

I have tried to simulate some data that causes the same problems I face with the real data. I keep running into divergent transitions (so I suspect I have trouble sampling the whole posterior). Even when I go to things like adapt_delta = 0.99999999 and stepsize=0.00001 (I even tried more extreme values), I do not seem to be able to get rid of the divergent transitions.

I seem to have gone through the standard proposals of picking a good parameterization (it certainly helped to the “literal” one), increasing the target acceptance probability and forcing NUTS to initially look at really small stepsizes.

I have generated some R code (R and stan code are attached) to generate some similar(ish) fake data and have attached some of the bivariate plots that may indicate what is going on with the divergent transitions.


I am wondering what else I could try.

example_with_simulated_data.R (1001 Bytes)
hs_model3.stan (1.8 KB)
[Please include Stan program and accompanying data if possible]

The horseshoe family of priors are a bit fragile in that you have to carefully choose the right parameterization to avoid divergences, and there are a lot of them. Each student-t can be, for example, reparameterized as a normal-gamma mixture (see the “Optimizing Stan Code for Efficiency” section of the manual for more information). The suggested Stan programs in the appendix of the 2017 paper on the two-scale horseshoe use one and two reparameterizations, respectively but you can sneak one more that might help. It might also be useful to reduce the scale of the “slab” scale too something more reasonable.

Long story short: the horseshoe is a relatively new and complex distribution and we’re still understanding how best to parameterize it in practice.

There are two Piironen & Vehtari (2017) papers, and Mike is referring to the second one with the regularized horseshoe. Here’s the link for the paper and code examples https://arxiv.org/abs/1707.01694

1 Like

@avehtari writes too many papers.

2 Likes

Thanks, the regularized horseshoe seems like a good choice for me. It does not suffer from the same problems with reasonable priors and I do have some prior belief on how large an effect could possibly be.