Is there a problem with looping the estimation?

I am trying to estimate the following stochastic volatility model for time series stock price data.

y[t] = mu2 + exp(h[t]/2)*ε(1).
h[t] = mu1 +φ(h[t-1]+mu1) + σ*ρ*(y[t-1]-mu2)/exp(h[-1]/2)+σ*\sqrt{1-ρ^2}*ε(2).
ε(1,2)~N(0,1).
y is vector of stock return, h is vector of hidden volatility.

data {
  int<lower=0> T;   // # time points (equally spaced)
  vector[T] y;      // mean corrected return at time t
  int<lower=0> N;   // # outsample time points (equally spaced)
  vector[N] y_new;      // mean corrected return at time t
}
parameters {
  real mu1;                     // mean log volatility
  real mu2;                     // mean index return
  real phi;  // persistence of volatility
  real<lower=0> sigma;         // white noise shock scale
  vector[T] h;                 // log volatility at time t
  real rho;                   // leverage 
}
model {
  phi ~ uniform(0, 1);
  sigma ~ cauchy(0, 2);
  mu1 ~ cauchy(-9, 5);
  mu2 ~ normal(0, 0.1);
  rho ~ uniform(-1, 1);

  h[1] ~ normal(mu1, sigma / sqrt(1 - phi * phi));
  for (t in 1:T)
    y[t] ~ normal(mu2, exp(h[t] / 2));
  for (t in 2:T)
    h[t] ~ normal(mu1 + phi * (h[t - 1] - mu1) + rho * sigma * exp(-h[t-1] / 2) * (y[t-1] - mu2), sqrt(1 - rho * rho) * sigma);
}

generated quantities {
        vector[N+1] h_new;
            h_new[1] = normal_rng(mu1 + phi * (h[T] - mu1) + rho * sigma * exp(-h[T] / 2) * (y[T] - mu2), sqrt(1 - rho * rho) * sigma);
        for (i in 2:N+1)
            h_new[i] = normal_rng(mu1 + phi * (h_new[i - 1] -  mu1) + rho * sigma * exp(-mu1 / 2) * (y_new[i-1] - mu2), sqrt(1-rho * rho) * sigma);
}

I first estimate the parameters and h using a sample of 500 days in the past, and then use the parameters to create h_new from the realized return y_new of the next 50 days.
Then, by repeating this over and over again, I try to estimate for ten years.

But the problem is that with each repetition, the amount of time required increases rapidly. Specifically, the first estimate and generation of h_new took 5 minutes, but by the sixth iteration, it was taking 15 minutes.

The stan code has been compiled once for the first time, and the following fit is the only thing we’re running many times in the python loop. Then, I put the average of the estimated h_new in the pre-prepared data frame.

for i in range(100):
 fit = sm.sampling(data=data_dat, iter=10000, chains=4,thin=2)
 data = data.append(mean)

Is looping stan’s fit when giving predictions while changing windows a bad way to do it?
Is there a way to streamline the calculation time? (A means of ensuring that each iteration is at least the same amount of time.)

It’s not clear to me what you are trying to do, but it seems that you are estimating the parameters and then using them to produce some kind of forecast based on a function of the parameters (h and/or y). But if you are using that forecast to generate random numbers you don’t really have new data, just some synthetic/simulated data so looping over it is at best recovering your first estimate, but more likely adding more noise and diluting the signal for the parameter values. That may be the source of the increased time for each estimate, but I think the main issue is making clear what you are trying to do with this procedure.

3 Likes