Question regarding the samplers at use for time-series models?

Hello, all. I have long been a theoretical fan of Bayesian statistics, especially since I read McElreath’s Statistical Rethinking, and I finally want to get my feat wet with some actual modelling.

What has drawn me to Stan most of all is that it seems to perform time-series analysis pretty well, as opposed to every other Bayesian library out there (WinBugs, Pymc3, JAGS, etc…)? One well-developed time-series example can be found here, .

What would be the reason for Stan’s apparent ease with time-series estimation? The only information on Stan’s samplers that I have been able to find are that it uses both HMC and NUTS, but other libraries (such as pymc3) also use them, but grind to a halt even with basic AR(1) models, or do not converge.

I would have guessed that Stan uses a special sampler for time-series, such as Sequential Monte Carlo (SMC), but that does not seem to be the case (?) .

Any statistical information on why Stan is so good in this area would be very helpful for me.

The reason time series are hard is that they induce correlation among parameters in the posterior if you’re not very careful with parameterization. Hamiltonian Monte Carlo (of which the no-U-turn sampler is an adaptive variant) is relatively effective at dealing with high-dimensional, correlated posteriors compared to Gibbs or Metropolis or ensemble methods (see, e.g., the original NUTS paper).

PyMC3 is using (roughly) the same NUTS algorithm as Stan. If you find a model won’t fit in PyMC3 and will fit in Stan, it’s probably due to parameterization.

We only use NUTS for MCMC. We have static HMC, but there’s not much point in doing that when you have NUTS. We’ve evaluated a lot of other alternatives, but haven’t found anything that’s better on a large subset of problems that we can express in Stan.

The best references are Michael Betancourt’s arXiv papers—I’d start with the conceputal introduction to HMC and move on to the exhaustive paper if you want to understand why algorithms like NUTS work and why algorithms like Metropolis and Gibbs and even basic HMC don’t.

1 Like

Thanks for the reply, Rob_Carpenter! I appreciate the thorough and direct answere to my few questions. So it sounds like the reparametrization is what makes time-series estimate well in Stan? That certainly makes sense to me, so I will be looking into that now.

Thanks again!

That’s true of a lot of things, not just time series. But whether you want to reparameterize will depend on how much data you have. With a lot of data, the natural parameterizations work well, but with not much data, the non-centered versions work better.

So imagine you have a very simple latent time-series model with observations:

\alpha_{n+1} \sim \textrm{normal}(\lambda \cdot \alpha_n, \tau) \\ y_n \sim \textrm{normal}(\alpha_n, \sigma)

You can use the natural parameterization above, or you can standardize the laten timte series like in other non-centered parametrizations,

\alpha^{\textrm{raw}}\sim \textrm{normal}(0, 1) \\ \alpha_{n + 1} = \lambda \cdot \alpha_{n} + \tau \cdot \alpha_{n+1}^{\textrm{raw}}

The beauty of this second parameterization is that the parameters \alpha_n are no longer correlated in the prior, so when there’s not much data, we can still get off the ground fitting.

I don’t know if we have a good study of fitting time series models anywhere. I didn’t actually try them when I wrote the manual chapters. There are a lot of tricks that can be used to replace some of this sampling with analytic marginalizations and also tricks to ensure you get stationary solutions.