AR/Kalman-like models where the k-th estimate of a variable depends only on obs 1..k-1

huffyhenry · August 28, 2017, 3:19pm

As explained in Time series modelling: intermediate values, I have a model with a latent variable y and with N indirect, ordered observations. This y is assumed to evolve in time, which is modelled by a noise term (informally, y[k] = y[k-1] + e_k where e_k ~ N(0, sigma) for all k).

It is essential to my application that y[k] is estimated only from observations 1…k-1 rather than the full set 1…N. I am looking for Stan examples/encodings that achieve this “no-lookahead” property (just the core idea is enough), or for an argument why this is impossible or a bad idea. Posting as a separate thread on @sakrejda’s advice.

bgoodri · August 28, 2017, 3:31pm

We have a gaussian_dlm_obs function that does Kalman filtering. @James_Savage does models like this a lot.

I understand you want a filter rather than a smoother, but that is easy enough to accomplish. Is your concern that the posterior distribution of common parameters like sigma depends on all N observations and you want to know what your posterior distribution for those common parameters would have been at a particular point in time when you had fewer observations?

James_Savage · August 28, 2017, 3:46pm

Hi @huffyhenry - is $y$ a vector?

I typically write up a Kalman filter from scratch. But do note that if you perform inference on the parameters of the process, these will be influenced by the future observations. It might be worth performing some checking to make sure that including future observations doesn’t much change the parameters of the process.

At work now, but I’ll try to find an example later on.

huffyhenry · August 28, 2017, 3:46pm

Thanks for the reply, Ben. If I understand the manual for gaussian_dlm_obs correctly, it doesn’t directly do what I need, but good to be aware of it all the same.

My concern is not so much about the “explicitly” common parameters like sigma (which I could live with being estimated from the whole dataset), but about y, the latent variable itself. It is y that I need estimated at intermediate points in time – specifically I want y[k] to depend just on observations 1…k-1. I don’t have a good grasp of nomenclature here, is this what distinguishes a “filter” from a “smoother”?

huffyhenry · August 28, 2017, 3:52pm

Hi @James_Savage, yes, y is a vector. And including future observations definitely changes the estimates significantly.

Looking forward to seeing your example(s).

bgoodri · August 28, 2017, 3:54pm

Yes; see, for example, time series - Kalman filter vs Kalman Smoother for beta calculations - Cross Validated .

But with Bayesian estimation methods (most of the Kalman filtering literature is non-Bayesian), you are drawing from the joint distribution of all the unknowns, in which case the posterior distribution of y[k] is not going to be independent of the posterior distribution for y[k + 1] even if you use a construction like y[k + 1] = y[k] + sigma * y_noise[k + 1]. And, of course, the posterior distribution of sigma depends on all the observations as well.

This isn’t usually a problem unless your question is what would the posterior distribution be on X date where X date is some time between k = 1 and k = K rather than sometime after K observations have been collected.

James_Savage · August 28, 2017, 4:00pm

When you say that future values affect the estimates, do you mean that they affect the estimates of y, or sigma? With a Kalman filter (not smoother) future observations will only affect present estimates of y through their impact on sigma (in this simple example). Would that be OK?

huffyhenry · August 28, 2017, 4:23pm

Both, but it’s only the direct retroactive effect on y that I care about. The underlying task is prediction of sports results: I can’t use next week’s results to predict the team strength (which is what y is) tonight. Strictly speaking, I shouldn’t allow the indirect effect through sigma either, but in my experience this parameter stabilizes quite quickly and future observations don’t move the needle much.

This sounds like what I am looking for.
Additional question now that I’m aware of the filter/smoother distinction: is the final estimate from a filter the same as the final estimate from a smoother?

Thanks for the clarification, Ben. This is indeed my question in a generalised form (I only care what the posterior of y[k] is at time k-1).

As an aside (not a criticism or a complaint), a lot of my confusion seems to stem from the fact that many intro Bayesian texts emphasise sequential processing of observations (start with the prior, analyse the first observation, now use the posterior as the new prior and analyse next observation, etc.), but this is not at all what the inference engines seem to be doing, at least not at a level that is exposed to the user.

bgoodri · August 28, 2017, 4:30pm

That only works analytically for very simple problems with conjugate relationships. If you want the posterior distribution as of X date, then you need to estimate the model (possibly with a Kalman filter or something equivalent, but not a smoother) using only the data up until X date. If there are multiple dates you are interested in, you have to run the model multiple times with multiple subsets of the data.

Charles_Driver · August 28, 2017, 9:15pm

“start with the prior, analyse the first observation, now use the posterior as the new prior and analyse next observation, etc.” - this is exactly what the kalman filter is doing, but at the level of inference about the unobserved states - the parameters governing the processes are treated as given…

huffyhenry · August 29, 2017, 8:40am

Thanks everyone. I’d still like to see a canonical Kalman filter in Stan if possible. Googling throws some results, but I am not sure if I can trust those implementations, and in any case they are not very enlightening. @James_Savage? Thanks in advance.

Charles_Driver · August 29, 2017, 9:40am

the R package ctsem that I’ve worked on will fit in stan and give you predicted, updated, and smoothed plots. Is for multivariate hierarchical continuous time formulations, so more complex than you want… but here’s an example with data generation so you can decide for yourself whether to trust it ;)

require(ctsem)
set.seed(1)
Tpoints=100
n.manifest=1
n.latent=1
testm<-ctModel(Tpoints=Tpoints,n.latent=n.latent,n.manifest=n.manifest,
  LAMBDA=diag(1), DRIFT=matrix(-.3),MANIFESTVAR=diag(.4,1),
  DIFFUSION=matrix(0.8),
  T0MEANS=matrix(-2),
  T0VAR=diag(1))
testd<-ctGenerate(testm,n.subjects=1,burnin=0,wide=FALSE)

checkm<-ctModel(Tpoints=Tpoints,type='stanct',
  n.latent=n.latent,
  n.manifest=n.manifest,
  LAMBDA=diag(1))
checkm$pars$indvarying<-FALSE #set all FALSE because only one subject

checkf<-ctStanFit(testd,checkm,iter=500,cores=2,chains=2)
ctStanKalman(checkf,plot=TRUE,kalmanvec=c('y','yprior'),timestep=.01)
ctStanKalman(checkf,plot=TRUE,kalmanvec=c('y','ysmooth'),timestep=.01,plotcontrol=list(xlim=c(0,20)))
ctStanDiscretePars(checkf,plot=TRUE)

Topic		Replies	Views
Time series modelling: intermediate values Modeling	9	1375	August 30, 2017
State space / kalman filtering advice for inverse model Modeling	13	3698	May 1, 2019
Filtering when only the conditional expectation is observed Modeling	3	76	July 19, 2024
More than one step lag between observation and prediction Modeling	1	343	November 4, 2020
Marginalization of latent states dynamic linear models General	8	1375	February 17, 2018

AR/Kalman-like models where the k-th estimate of a variable depends only on obs 1..k-1

Related topics