Time Series Model Quality, Out of Sample Likelihood

How to access quality of time series model? To compare two models A and B - which one is better?

An example - Stochastic Volatility with T distribution, data - historical daily log returns \{r_t\} for stock.

\begin{align} r_t &= \mu + \exp\!\left(0.5h_t\right)\varepsilon_t \ \qquad \varepsilon_t {\sim} T_{\nu}(0,1)\\ h_t &= \omega + \phi (h_{t-1} - \omega) + \sigma \eta_t \qquad \eta_t {\sim} \mathcal{N}(0,1), \\[6pt] \end{align}

If I understand correctly Stan MCMC can’t estimate out of sample likelihood efficiently, right? It can’t do efficient incremental state update with next data point and require full refit, so time series of 5k points would require 5k mcmc refits, which is unpractical.

\sum p(r_t|\Theta,r_{1:t-1}) \quad t \in [2:T]

I assume the proper way to estimate it would be a) fit model with MCMC and get params posterior \Theta b) use these params \Theta with Particle Filter to estimate latent state without look ahead and calculate proper out of sample likelihood.

Strictly speaking it looks ahead a little bit, for params \Theta estimation, but not for state, but that’s ok.

I wonder if Stan has some extension that can run Stan Model with Particle Filter? Or nothing like that available, and some separate Particle Filter library required?

There’s PSIS-LOO approximate, but it produces absolutely terrible results. Models A and B have very different moments, and one model is clearly better, yet PSIS-LOO almost identical.

Maybe PSIS-LOO can’t be used in this specific case, because the data and the model is heavy tailed, and has jumps. The maximum of Pareto shape parameter k_\text{max}=2.4, there’re 1.9% points with k > 0.7.

Another approach to use Moments estimated on real and simulated data, like a) static moments, parameters of estimated T_\nu(\mu, \sigma) b) dynamic moments GARCH parameters.

In my specific case moments show a big difference between models A and B, while PSIS-LOO show identical Expected Log Predictive Dencity.

Wondering how people access the time series model quality? Do they use additional Particle Filter libraries? I’m new to MCMC, and maybe missing something obvious.

See

This question is also answered in Cross-Validation FAQ

Thanks. As far as I understand LFO may require couple refits for the model, which could be slow. In my case 5k points with 1.9% where k > 0.7 it may require couple of refits.

I wonder, why Particle Filter rarely used? It also approximates the predictive likelihood. And, additionally it has very desirable feature - update model state incrementally. So you get couple benefits at once.

It could work well as combo Stan MCMC solves the hard task - get the posterior for params. Then simple Particle Filter reuse it to estimate leave future out state approximately, and also could be used in production, as fast online incremental state update.

That importance sampling based LFO-CV is equivalent to particle filter. Particle filter uses importance weighting and the rejunevation step is the same as MCMC in LFO-CV.

It’s the importance resampling. In general, it gets harder and harder the more steps you want to take with particle filters (or more generally, sequential Monte Carlo). But if you have a large number of data points N and add another iid draw to bring the total to N + 1, the posteriors are usually close enough that importance (re)sampling works. It is harder when N is small or the data has very high variance, because then the posteriors vary more between N and N + 1 data points.

Compare using a particle filter to taking M draws from your fit to N data, then warm-starting HMC to take a few more iterations to get draws from N + 1 data points.

Isn’t this using importance sampling rather than sampling importance resampling like a particle filter would use? For those following along, importance sampling can be viewed as the Rao-Blackwellized form of sampling importance resampling, which in practical cases means you will get more accurate, lower-variance estimates.

1 Like

Thanks, can you please specify what is “harder” means exactly?

Is it no latent path diversity, most particles at later t share the same ancestor at earlier t? So at early stages - instead of cloud of many latent paths, it would collapse to just one realisation?

But, isn’t it important only for smoothing (estimating past states matter) and not important for filtering or predicting (only the correct estimate of the current state matter)?

For time series, when the goal is filtering - estimating correctly state at the current time t and predicting state at t+1 - the correct estimation of far away in past states seems to be not important. Do I miss something?

Particle filter doesn’t need to to resampling every step and can use weights until rejuvenation step (at least when I was working on them 20 years ago, using weights was the default) which in the simplest form can be just the resampling to drop particles with near zero weights.