Estimation of latent variables in real time

I would like to evaluate the latent variables in real time in the out-sample for a model with fixed parameters.

For example, I will use the following case from the Stan User’s Guide to explain.

In this case, we estimate Stochastic volatility models.


data {
  int<lower=0> T;   // # time points (equally spaced)
  vector[T] y;      // mean corrected return at time t
}
parameters {
  real mu;                     // mean log volatility
  real<lower=-1,upper=1> phi;  // persistence of volatility
  real<lower=0> sigma;         // white noise shock scale
  vector[T] h;                 // log volatility at time t
}
model {
  phi ~ uniform(-1, 1);
  sigma ~ cauchy(0, 5);
  mu ~ cauchy(0, 10);
  h[1] ~ normal(mu, sigma / sqrt(1 - phi * phi));
  for (t in 2:T)
    h[t] ~ normal(mu + phi * (h[t - 1] -  mu), sigma);
  for (t in 1:T)
    y[t] ~ normal(0, exp(h[t] / 2));
}

After estimating parameters such as phi, sigma and mu, I want to fix those parameters and then evaluate h[T+1],h[T+2],… h[T+N]using y[T+1],y[T+2],…,y[T+N].


data {
  int<lower=0> N;   // # time points (equally spaced)
  vector[N] y;      //new  return after time T
  real mu;                     // estimated mean log volatility
  real<lower=-1,upper=1> phi;  //estimated persistence of volatility
  real<lower=0> sigma;         //estimated white noise shock scale
}
parameters {
  vector[N] h;                 // log volatility at time t
}
model {
  h[1] ~ normal(mu, sigma / sqrt(1 - phi * phi));
  for (t in 2:N)
    h[t] ~ normal(mu + phi * (h[t - 1] -  mu), sigma);
  for (t in 1:N)
    y[t] ~ normal(0, exp(h[t] / 2));
}

However, here I want to estimate it as if it were done in real time. In other words, I should not use the data of y[T+2] to estimate h[T+1].
However, in the usual way I would use y[T+1]… …y[T+N] to estimate the past h.

It seems too inefficient to loop and collect the last value, increasing by one point each time.

Is there any way to solve this kind of problem?

Thanks.


1 Like

Sorry, can’t really delve deeply into the problem. There seem to be at least two interpretations of your question:

  1. You are trying to use “posterior as next prior”, i.e. avoid fully refitting the model as new data becomes available. This is theoretically appealing, but it is generally discouraged as there is no good way to accomplish that in practice (see e.g. Using posteriors as new priors - #4 by mike-lawrence). You are usually better served by refitting the model with the full data available.

  2. You are trying to find a more efficient way to compute some conditional distributions given your samples without recomputing a bunch of other stuff. I would suspect that in this case, your main bottleneck is model fitting and so I would definitely check, whether the simple way to compute the distributions in a loop isn’t already of satisfactory performance and only attempt optimization once you are sure this is necessary.

Best of luck with your model!

2 Likes