Do Stan Time Series models need to be in temporal order?

I have been building some time series panel model in stan, similar in style to the one detailed here:

In short, I structured my Moving Average model in panel and then temporal order, and then iterated back the length of the panel to gather the residuals from the previous observations.

I now actually have multiple panels (or groups) over which I wish to gather residuals. Using the classic classroom example, I may already incorporate historic residuals from the student’s performance but now want to incorporate the residuals from the class performance as well. So simply ‘iterating back’ the length of the previous observations in the group won’t work, as I can’t order the data structure sequentially for both the student and class.

Instead I’ve tried providing the index of the previous observations for the class (and gather the residuals for the class in a for loop within the models), and kept the data structure in panel and then temporal order for each student. This has the unintended consequence that when viewing the data by class the observations are not in temporal order.

When running the model, I get ‘log probability evaluates to negative infinity’ errors, I suspect because stan can’t gather residuals from observations that appear after the observation currently being sampled in the data structure.

I suspect the solution is therefore to order the whole dataset temporally (and not temporally within panels) and then do the index-forloop approach for gathering every previous residual for each panel. This will require quite an overhaul of the current code however and would like to get confirmation that my current understanding and proposed solutions are valid before proceeding.

Is my understanding correct?

Code in the model block can access any variables in the data, transformed data, parameters, or transformed parameters blocks. There isn’t any limitation there. Also, it might seem a bit strange, but sampling statements aren’t actually sampling anything – they’re just incrementing the log density.

This sounds more like a programming/indexing error. I’m not familiar enough with the terminology (or the orderings) to give a clearer answer. Just wanted to not leaving this question hanging.

Stan supports print statements, so at worse, just start printing things in inner loops and double check that they’re sane.

1 Like

Thanks for your response Ben.

Yeah I reorganised the data and it seemed to work. Basically in the typical stan moving average model,
[example seen here:


]

stan users just seem to ‘go up 1’ in the order of the data to get the relevant epsilon. See epsilon[t-1] in the example. But in my data made up of lots of groups, this wasn’t convenient. Instead I had to provide an index of where to find the relevant epsilon. But if I tried to provide an index that was greater than the index I was currently sampling from e.g. epsilon[t+1] I would get the aforementioned error.

It seemed epsilons could only come from indexes less than t, but I’m not sure if this is actually true, or if there was some other error in my code. The “incrementing the log probability” part I still can’t get my head around. But then again there’s was lots of Stan stuff I didn’t understand 6 months ago that now seems to click. Just got to keep iterating!

Figure 2 in https://arxiv.org/pdf/1206.1901.pdf might be helpful. The input to HMC is the log density, \log(P(X | \theta) P(\theta)), and the gradients of that with respect to \theta. Once you have those two things, then given an old MCMC draw \theta_k, you can generate a new one, \theta_{k + 1}, using the code in Figure 2.

Stan is doing an elaborate version of that.

Yeah so that’s probably an indexing error. If something like this is giving you too much trouble, it might be worth coding it up the log density evaluation in R so it’s easier to debug and then slowly converting bits to Stan code.

2 Likes