AR/Kalman-like models where the k-th estimate of a variable depends only on obs 1..k-1

As explained in Time series modelling: intermediate values, I have a model with a latent variable y and with N indirect, ordered observations. This y is assumed to evolve in time, which is modelled by a noise term (informally, y[k] = y[k-1] + e_k where e_k ~ N(0, sigma) for all k).

It is essential to my application that y[k] is estimated only from observations 1…k-1 rather than the full set 1…N. I am looking for Stan examples/encodings that achieve this “no-lookahead” property (just the core idea is enough), or for an argument why this is impossible or a bad idea. Posting as a separate thread on @sakrejda’s advice.

We have a gaussian_dlm_obs function that does Kalman filtering. @James_Savage does models like this a lot.

I understand you want a filter rather than a smoother, but that is easy enough to accomplish. Is your concern that the posterior distribution of common parameters like sigma depends on all N observations and you want to know what your posterior distribution for those common parameters would have been at a particular point in time when you had fewer observations?

Hi @huffyhenry - is $y$ a vector?

I typically write up a Kalman filter from scratch. But do note that if you perform inference on the parameters of the process, these will be influenced by the future observations. It might be worth performing some checking to make sure that including future observations doesn’t much change the parameters of the process.

At work now, but I’ll try to find an example later on.

Thanks for the reply, Ben. If I understand the manual for gaussian_dlm_obs correctly, it doesn’t directly do what I need, but good to be aware of it all the same.

My concern is not so much about the “explicitly” common parameters like sigma (which I could live with being estimated from the whole dataset), but about y, the latent variable itself. It is y that I need estimated at intermediate points in time – specifically I want y[k] to depend just on observations 1…k-1. I don’t have a good grasp of nomenclature here, is this what distinguishes a “filter” from a “smoother”?

Hi @James_Savage, yes, y is a vector. And including future observations definitely changes the estimates significantly.

Looking forward to seeing your example(s).

Yes; see, for example, https://stats.stackexchange.com/questions/289188/kalman-filter-vs-kalman-smoother-for-beta-calculations .

But with Bayesian estimation methods (most of the Kalman filtering literature is non-Bayesian), you are drawing from the joint distribution of all the unknowns, in which case the posterior distribution of y[k] is not going to be independent of the posterior distribution for y[k + 1] even if you use a construction like y[k + 1] = y[k] + sigma * y_noise[k + 1]. And, of course, the posterior distribution of sigma depends on all the observations as well.

This isn’t usually a problem unless your question is what would the posterior distribution be on X date where X date is some time between k = 1 and k = K rather than sometime after K observations have been collected.

When you say that future values affect the estimates, do you mean that they affect the estimates of y, or sigma? With a Kalman filter (not smoother) future observations will only affect present estimates of y through their impact on sigma (in this simple example). Would that be OK?

Both, but it’s only the direct retroactive effect on y that I care about. The underlying task is prediction of sports results: I can’t use next week’s results to predict the team strength (which is what y is) tonight. Strictly speaking, I shouldn’t allow the indirect effect through sigma either, but in my experience this parameter stabilizes quite quickly and future observations don’t move the needle much.

This sounds like what I am looking for.
Additional question now that I’m aware of the filter/smoother distinction: is the final estimate from a filter the same as the final estimate from a smoother?

Thanks for the clarification, Ben. This is indeed my question in a generalised form (I only care what the posterior of y[k] is at time k-1).

As an aside (not a criticism or a complaint), a lot of my confusion seems to stem from the fact that many intro Bayesian texts emphasise sequential processing of observations (start with the prior, analyse the first observation, now use the posterior as the new prior and analyse next observation, etc.), but this is not at all what the inference engines seem to be doing, at least not at a level that is exposed to the user.

That only works analytically for very simple problems with conjugate relationships. If you want the posterior distribution as of X date, then you need to estimate the model (possibly with a Kalman filter or something equivalent, but not a smoother) using only the data up until X date. If there are multiple dates you are interested in, you have to run the model multiple times with multiple subsets of the data.

“start with the prior, analyse the first observation, now use the posterior as the new prior and analyse next observation, etc.” - this is exactly what the kalman filter is doing, but at the level of inference about the unobserved states - the parameters governing the processes are treated as given…

Thanks everyone. I’d still like to see a canonical Kalman filter in Stan if possible. Googling throws some results, but I am not sure if I can trust those implementations, and in any case they are not very enlightening. @James_Savage? Thanks in advance.

the R package ctsem that I’ve worked on will fit in stan and give you predicted, updated, and smoothed plots. Is for multivariate hierarchical continuous time formulations, so more complex than you want… but here’s an example with data generation so you can decide for yourself whether to trust it ;)

require(ctsem)
set.seed(1)
Tpoints=100
n.manifest=1
n.latent=1
testm<-ctModel(Tpoints=Tpoints,n.latent=n.latent,n.manifest=n.manifest,
  LAMBDA=diag(1), DRIFT=matrix(-.3),MANIFESTVAR=diag(.4,1),
  DIFFUSION=matrix(0.8),
  T0MEANS=matrix(-2),
  T0VAR=diag(1))
testd<-ctGenerate(testm,n.subjects=1,burnin=0,wide=FALSE)

checkm<-ctModel(Tpoints=Tpoints,type='stanct',
  n.latent=n.latent,
  n.manifest=n.manifest,
  LAMBDA=diag(1))
checkm$pars$indvarying<-FALSE #set all FALSE because only one subject

checkf<-ctStanFit(testd,checkm,iter=500,cores=2,chains=2)
ctStanKalman(checkf,plot=TRUE,kalmanvec=c('y','yprior'),timestep=.01)
ctStanKalman(checkf,plot=TRUE,kalmanvec=c('y','ysmooth'),timestep=.01,plotcontrol=list(xlim=c(0,20)))
ctStanDiscretePars(checkf,plot=TRUE)
1 Like