Large number of divergencies but Rhat, Neff, and result look OK

Slawek_Smyl · November 24, 2018, 7:36pm

Hi,
I and my friends are about to publish to CRAN an rstan-based time series forecasting package. The models perform well in accuracy, generally beating standard algorithms like ARIMA, ETS etc. But occasionally, especially in case of seasonal models, I am getting warnings of large number of divergencies (e.g. 14K on 4*10K chains). Our adaptive rate is by default 0.9, increasing it further does not help. But Rhat and Neff are not necessarily bad, e.g.

coefTrend       0.82    0.02 0.58   0.05   0.35   0.71   1.17   2.17   927 1.00
powTrend       -0.29    0.01 0.17  -0.49  -0.42  -0.34  -0.20   0.16   993 1.00
locTrendFract   0.69    0.00 0.11   0.52   0.61   0.68   0.76   0.92   909 1.00
sigma           0.11    0.00 0.10   0.00   0.03   0.09   0.17   0.34  1026 1.00
offsetSigma     0.23    0.00 0.12   0.02   0.12   0.22   0.33   0.44  1037 1.00
levSm           0.93    0.00 0.05   0.80   0.90   0.94   0.97   1.00   824 1.01
bSm             0.93    0.00 0.06   0.79   0.89   0.94   0.97   1.00  1998 1.00
nu              2.46    0.01 0.38   2.02   2.17   2.36   2.64   3.44  2478 1.00
powx            0.16    0.00 0.15   0.00   0.05   0.11   0.23   0.59  1161 1.00

Average Rhat over all (including transformed, not listed above) parameters is e.g. 1.005 (actually, by default the algorithm would automatically double number of iterations and repeat if avg(Rhat)>1.006)
And the forecast is not bad either, as measured by backtesting.
I tried mcmc_parcoord() and pairs() - it does not tell me anything, the divergencies are all over the place - I see just sea of red :-)
So, what do you think?

Regards,
Slawek

bbbales2 · November 24, 2018, 10:49pm

Groovy!

It’s still bad and the results still shouldn’t be trusted. The divergences indicate that the sampler is trying to go somewhere but it can’t get there.

Make sure and forward these errors to your users so they can understand something is going wrong. Maybe the guidelines say something about this but I imagine that’s the desirable behavior (Guidelines for developers of R packages interfacing with Stan • rstantools).

I haven’t tried this before, but at worst maybe you use the grad_log_prob function to write a leapfrog integrator that runs around the Hamiltonian yourself? That might give you more insight into what is causing things to blow up. You’d need to get the mass matrix from Stan’s adaptation. You can get that from the cmdstan output (and you can dump your R inputs to the cmdstan format with stan_rdump). The “Using multiple step sizes” section of Radford’s HMC shows how to do this: https://arxiv.org/pdf/1206.1901.pdf . That make sense at all?

The usefulness of this has been debated: Getting the location + gradients of divergences (not of iteration starting points) .

Slawek_Smyl · November 25, 2018, 3:10am

Hi Ben,
Thank you for your thoughts and the links.
I am afraid, what you are suggesting (“to write a leapfrog integrator that runs around the Hamiltonian”) is way above my skill and understanding levels, so I am unlikely to do it.
But I take heart from the fact that my divergencies are very well scattered, visibly covering the whole distribution of parameters, they are not clustered, so hopefully not that deadly.

Regards,
Slawek

avehtari · November 25, 2018, 11:30am

If the model has several hierarchical components each producing a funnel shape in the posterior, then it is possible that you see divergences all over. Try looking at the prior scale parameters and if at divergences at least one of them is always small. I have right now that kind of model and even after I know the problem it’s not easy to fix (non-centered parameterization is not enough to fix it)

Slawek_Smyl · November 25, 2018, 3:18pm

Hi,
No, there is no hierarchy in the model, the model is fitted per time series. It has several components, like trend, seasonalities - they are either additive or multiplicative, so possibly there are problems with identifiability.

avehtari · November 25, 2018, 6:28pm

If you want more help it would be easier if you show the model code and all Rhats and neffs.

Slawek_Smyl · November 26, 2018, 1:41am

Thank you. The current model is quite complicated, as it offers many options. I will create a simple version and a small script to run it.

Topic		Replies	Views
Divergent Transitions with R_hat < 1.1 Modeling fitting-issues , performance , shinystan	14	2448	June 13, 2019
Divergent transitions after warm up & Rhat General	5	4091	July 28, 2017
Divergence or Convergence based on rhat using different datasets Modeling	16	1942	May 1, 2018
Help with understanding my results Modeling fitting-issues , specification , performance	5	1220	February 8, 2019
Rhat < 1 (as low as 9.94e-01). Why? General	6	4017	June 19, 2019

Large number of divergencies but Rhat, Neff, and result look OK

Related topics