I am currently trying to do some time-series analysis using a state-space model.

I would like to link time series T_1,…,T_n together with a shared trend LL, so that T_1,…,T_n are observations from local trends L1,…,Ln added onto LL, i.e. T_i = L_i + LL + sigma_i.

The time series are a little bit tricky, with scale dependent noise, mean reverting second differences, etc. and I am wondering if this is maybe messing up my initial (and so far failed) attempts.

Anybody have pointers to work on this sort of thing?

Are you talking about something like how dynamic factor models run with state space models connected together? So I have three time series and I run them together to find a common trend.

But some more details might be handy. Are you saying that T_i is a separate time-series of length m_i? So you have N time-series each with a different but otherwise constant observation error \sigma_i? Or is N the length of a single time-series with varying observation error but a stable mean?

If LL is a global mean trend then you might be able to use this as the c_t in the linear SSM and supply it to each function call of a function similar to @jrnold’s ssm_lpdf.

Alas, the error is not constant even inside a single time-series (N equal-length time series)! I’ll check out the link though - it looks interesting. Thanks for the pointer.

The big problem with this type of model is that changes to the global trend change the fit of ALL the timeseries observations, because of this the global trend is tightly constrained in the posterior and there is a dramatic difference in the scales of any global parameters vs the local parameters. Timesteps will need to be very small.

One thing you might try is first doing an optimization, then take the values for the global trend and plug them into a second model as fixed constants, then fit the second model… This approximates the global trend as Delta function around a fixed value which might be an ok approximation for the global function.

If you want to relax the global function, try a parameterization in terms of fixed values you find in your optimization, plus normal(0,1)/2^k and keep increasing k until you’ve found the right scale to make it work.

This sounds a bit like one of the alternatives I’ve been thinking of. I’ll probably end up doing it since it looks like the best I can do: there is a fundamental co-linearity problem that I can’t see any way to get around. I spent a while experimenting with adding an extra reversion to global trend feature to the various observable processes. I was vaguely hoping that somebody might point me to work arguing that this was a principled idea, then it would not be my responsibility, but that idea is if I am honest (a) not principled, (b) anyway is hard to get to work. So I might as well use a different unprincipled method that is easier to get to work.

Well, I think there are two issues, one is identifiability. What’s the difference between the “global trend” and any one of the individual trends? If there’s no real difference in specification, then they’re interchangeable potentially, and you can’t identify the global trend. Stuff will tend to wander off in opposite directions, like y = a + b, if you a goes to infinity and b goes to negative infinity, y stays constant…

To solve this, for example you might impose some greater smoothness on the global trend, and then if the individual trends are necessarily wigglier this gives identifiability because they can’t exactly interchange their roles in the calculation. It will help identifiability to be much smoother not just a little smoother (like you might consider doing a fourier series, or a polynomial or a radial basis function, and keep the number of coefficients small relative to the number of observations in the series)

The second issue though is the dependency structure. The fit of all the individual time points changes with an epsilon change in the global trend. Think of the global trend as a mass with a LOT of small springs holding it in place, one for each anchored data point. It won’t want to move very far from its equilibrium position with all those springs pushing it back, so if it’s identifiable, it will ultimately be tightly identified around some particular value.

Replacing a tightly identified set of parameters with their modal value is a principled simplification… like y = normal(1,.000002) could be replaced with y=1 with only an error on the order of .000002.

Think about your model to see if there is anything about the way the global trend is specified which distinguishes it from the local trends, like smoothness / degrees of freedom etc. If not, perhaps impose a simpler structure on the global trend so it becomes identified. Then, do the optimization procedure I mentioned, and replace the global trend with a fixed value, to get started at least, and see if you can get the local trends to fit.

I got curious about this thinking about the identification issue. It seems like no problem to me – the common process is identified by the covariance between the individual processes. ctsem (R package) allows this sort of model to be specified and estimated using stan (or an optimizer / importance sampler) via a continuous time extended/unscented kalman filter.
See here for code: https://gist.github.com/cdriveraus/27fa9048f6f683f7548929d6dfe38975
Estimation of the initial state might show some of the identification problem, but is not so critical, could be constrained in some way.

Also very interesting looking. It occurs to me, nostalgically, that once upon a time - in the dim and distant past - we might have done this sort of thing in Saarbrücken (at MPII), not Berlin! Shall check it out. Thanks.

You mean like if we shift the common process up, to retain good fit all the individual processes must simultaneously go down… whereas that’s not true for all the individual processes… Yes I think this is correct.

However it still is the case that we expect the common process to be tightly constrained precisely because of this covariance, so the timestep and difference in scale of the common process still comes into play when trying to HMC fit.

OK - report back. Looks, unfortunately, like the idea won’t work in this case. Not because it isn’t a good idea, but because one of the series (for the US) is so much bigger than the others (e.g. Portugal) that any reasonable assignment of weights ends up simply with the US as the reference process, and Portugal as the divergence, which is not really useful to me. I shall store the insight away for future use though.

@Charles - if you are still around in Berlin, would you be interested in getting together for coffee sometime? - I’m there a couple of Fridays a month, and always interested in meeting up with people who do this sort of thing - would be interested to find out what is going on at Max-Planck.

Yes am in Berlin, and sure, happy to chat, mail me some time. If you want to try it out with the code I gave I don’t see why different obs counts would be a problem, as the trend process only exists in so far as there is covariance.

As I said, perhaps make the reference process much smoother than any individual country timeseries. Like supposing you had 100 years of monthly results for the US, you’d have 1200 observations. If the reference process is a 10 term fourier series it will always have additional divergence to be modeled by the US specific series, and also for the Portugal specific series and etc etc.

If you have some other model for the reference series, make sure it is constrained to have smooth changes, either through strong priors on parameters that define the process, or through parameterizing in terms of an inherently smooth process (fourier, polynomial, etc) or whatever mechanism seems best.

The commonality of the common process inherently means averaging away the fluctuations that aren’t strongly correlated between countries, and this generally produces a slower changing function.

Alas, I don’t see how that would work here - I’m trying to extrapolate past partially-shared secular shocks and I haven’t been able to find a completely convincing model for this (I’m using a process with mean reverting second differences - with non-gaussian - either cauchy or laplace - innovations - which works pretty well, but the extrapolation right where I need it does not look right). I was thinking that I could use a refererence trend to regularise/damp the effect of the shocks, because the downturn I’m getting looks too extreme. But making the reference process much smoother would just mean that I would lose any regularisation effect, because then it would not show the shock at all.

Unfortunately, to be honest, what I’m looking at is Knightian uncertainty, so there isn’t a principled extrapolation in the first place (!)