Filtering when only the conditional expectation is observed

I’m trying to work through the following model:
X_t = X_{t-1} + \varepsilon_t
Y_t = Y_{t-1} + \nu_t
Z_t = E[X_t \mid Y_t,Y_{t-1},...,Y_0]

where only Z_t is observable. Additionally, and this is central to the application, \varepsilon_t is non-Gaussian (though \nu_t can be Gaussian), which means that there isn’t a closed-form solution for the filtering problem.

Is there any hope for handling this with Stan?

Hi, @Ian_db and thanks for asking.

I can’t quite wrap my head around how to do this with only an expectation for Z_t rather than as a distribution. It’s easy enough to code the X_t, Y_t series assuming you can write down. density foo for x and you can provide x[1], y[1] distributions if you want a complete likelihood.

data {
  int<lower=0> T;
  vector[T] z;
}
parameters {
  vector[T] y;
  vector[T] x;
  vector<lower=0> sigma;
}
model {
  x[1] ~ ...?...;
  for (t in 2:T) {
    x[t] ~ foo(x[t-1], ...);
  }
  y[2] ~ ...?...;
  for (t in 2:T) {
    y[t] ~ normal(y[t-1], sigma);
  }

But then I don’t see how Z_t is supposed to come into the model. It’s giving you an expectation of X_t but X_t already has a generative model from the time series.

One way to think about it is that Z_t is a deterministic function of the history up to t, {Y_0,Y_1,...,Y_t}. That function is usually the output of a filtering method of some sort (e.g. the Kalman or particle filter).

For example, think of t=0. There’s just a initial distribution for X_0. Knowing the distribution of \nu_0, there is (if the model is well-behaved) a one-to-one mapping between Y_0 and Z_0. If you observed Y_0, you could calculate E[X_0 \mid Y_0], and you can just as well imagine reversing that calculation to get Y_0 having observed Z_0.

Another way to say it: think of a function Z_0 = f(Y_0). If f is known (and invertible), then we can get Y_0 from Y_0=f^{-1}(Z_0) and then it’s just a standard filtering problem. What makes this tricky is that the function f may be very complicated, and it also depends on all the parameters of the model.

That logic then extends to any other t>0 (and the function f depends on the entire history up to date t).

Right now I have it coded up with a Gaussian mixture filter and I’m just maximizing the likelihood. I was just curious whether stan had any potential ability to handle it, since it’s often much more effective than code I write myself.

Is this a situation where the expectation is invertible? I can see how things would work out if that’s the case—then it’s just a change-of-variables problem.

But no, Stan doesn’t have anything built in for this.