I and my friends are about to publish to CRAN an rstan-based time series forecasting package. The models perform well in accuracy, generally beating standard algorithms like ARIMA, ETS etc. But occasionally, especially in case of seasonal models, I am getting warnings of large number of divergencies (e.g. 14K on 4*10K chains). Our adaptive rate is by default 0.9, increasing it further does not help. But Rhat and Neff are not necessarily bad, e.g.
coefTrend 0.82 0.02 0.58 0.05 0.35 0.71 1.17 2.17 927 1.00
powTrend -0.29 0.01 0.17 -0.49 -0.42 -0.34 -0.20 0.16 993 1.00
locTrendFract 0.69 0.00 0.11 0.52 0.61 0.68 0.76 0.92 909 1.00
sigma 0.11 0.00 0.10 0.00 0.03 0.09 0.17 0.34 1026 1.00
offsetSigma 0.23 0.00 0.12 0.02 0.12 0.22 0.33 0.44 1037 1.00
levSm 0.93 0.00 0.05 0.80 0.90 0.94 0.97 1.00 824 1.01
bSm 0.93 0.00 0.06 0.79 0.89 0.94 0.97 1.00 1998 1.00
nu 2.46 0.01 0.38 2.02 2.17 2.36 2.64 3.44 2478 1.00
powx 0.16 0.00 0.15 0.00 0.05 0.11 0.23 0.59 1161 1.00
Average Rhat over all (including transformed, not listed above) parameters is e.g. 1.005 (actually, by default the algorithm would automatically double number of iterations and repeat if avg(Rhat)>1.006)
And the forecast is not bad either, as measured by backtesting.
I tried mcmc_parcoord() and pairs() - it does not tell me anything, the divergencies are all over the place - I see just sea of red :-)
So, what do you think?
It’s still bad and the results still shouldn’t be trusted. The divergences indicate that the sampler is trying to go somewhere but it can’t get there.
Make sure and forward these errors to your users so they can understand something is going wrong. Maybe the guidelines say something about this but I imagine that’s the desirable behavior (http://mc-stan.org/rstantools/articles/developer-guidelines.html).
I haven’t tried this before, but at worst maybe you use the grad_log_prob function to write a leapfrog integrator that runs around the Hamiltonian yourself? That might give you more insight into what is causing things to blow up. You’d need to get the mass matrix from Stan’s adaptation. You can get that from the cmdstan output (and you can dump your R inputs to the cmdstan format with stan_rdump). The “Using multiple step sizes” section of Radford’s HMC shows how to do this: https://arxiv.org/pdf/1206.1901.pdf . That make sense at all?
The usefulness of this has been debated: Getting the location + gradients of divergences (not of iteration starting points) .
Thank you for your thoughts and the links.
I am afraid, what you are suggesting (“to write a leapfrog integrator that runs around the Hamiltonian”) is way above my skill and understanding levels, so I am unlikely to do it.
But I take heart from the fact that my divergencies are very well scattered, visibly covering the whole distribution of parameters, they are not clustered, so hopefully not that deadly.
If the model has several hierarchical components each producing a funnel shape in the posterior, then it is possible that you see divergences all over. Try looking at the prior scale parameters and if at divergences at least one of them is always small. I have right now that kind of model and even after I know the problem it’s not easy to fix (non-centered parameterization is not enough to fix it)
No, there is no hierarchy in the model, the model is fitted per time series. It has several components, like trend, seasonalities - they are either additive or multiplicative, so possibly there are problems with identifiability.
If you want more help it would be easier if you show the model code and all Rhats and neffs.
Thank you. The current model is quite complicated, as it offers many options. I will create a simple version and a small script to run it.