Is there a problem with looping the estimation?

SHINICHI · November 21, 2020, 8:48am

I am trying to estimate the following stochastic volatility model for time series stock price data.

y[t] = mu2 + exp(h[t]/2)*ε(1).
h[t] = mu1 +φ(h[t-1]+mu1) + σ*ρ*(y[t-1]-mu2)/exp(h[-1]/2)+σ*\sqrt{1-ρ^2}*ε(2).
ε(1,2)~N(0,1).
y is vector of stock return, h is vector of hidden volatility.

data {
  int<lower=0> T;   // # time points (equally spaced)
  vector[T] y;      // mean corrected return at time t
  int<lower=0> N;   // # outsample time points (equally spaced)
  vector[N] y_new;      // mean corrected return at time t
}
parameters {
  real mu1;                     // mean log volatility
  real mu2;                     // mean index return
  real phi;  // persistence of volatility
  real<lower=0> sigma;         // white noise shock scale
  vector[T] h;                 // log volatility at time t
  real rho;                   // leverage 
}
model {
  phi ~ uniform(0, 1);
  sigma ~ cauchy(0, 2);
  mu1 ~ cauchy(-9, 5);
  mu2 ~ normal(0, 0.1);
  rho ~ uniform(-1, 1);

  h[1] ~ normal(mu1, sigma / sqrt(1 - phi * phi));
  for (t in 1:T)
    y[t] ~ normal(mu2, exp(h[t] / 2));
  for (t in 2:T)
    h[t] ~ normal(mu1 + phi * (h[t - 1] - mu1) + rho * sigma * exp(-h[t-1] / 2) * (y[t-1] - mu2), sqrt(1 - rho * rho) * sigma);
}

generated quantities {
        vector[N+1] h_new;
            h_new[1] = normal_rng(mu1 + phi * (h[T] - mu1) + rho * sigma * exp(-h[T] / 2) * (y[T] - mu2), sqrt(1 - rho * rho) * sigma);
        for (i in 2:N+1)
            h_new[i] = normal_rng(mu1 + phi * (h_new[i - 1] -  mu1) + rho * sigma * exp(-mu1 / 2) * (y_new[i-1] - mu2), sqrt(1-rho * rho) * sigma);
}

I first estimate the parameters and h using a sample of 500 days in the past, and then use the parameters to create h_new from the realized return y_new of the next 50 days.
Then, by repeating this over and over again, I try to estimate for ten years.

But the problem is that with each repetition, the amount of time required increases rapidly. Specifically, the first estimate and generation of h_new took 5 minutes, but by the sixth iteration, it was taking 15 minutes.

The stan code has been compiled once for the first time, and the following fit is the only thing we’re running many times in the python loop. Then, I put the average of the estimated h_new in the pre-prepared data frame.

for i in range(100):
 fit = sm.sampling(data=data_dat, iter=10000, chains=4,thin=2)
 data = data.append(mean)

Is looping stan’s fit when giving predictions while changing windows a bad way to do it?
Is there a way to streamline the calculation time? (A means of ensuring that each iteration is at least the same amount of time.)

caesoma · November 22, 2020, 4:32am

It’s not clear to me what you are trying to do, but it seems that you are estimating the parameters and then using them to produce some kind of forecast based on a function of the parameters (h and/or y). But if you are using that forecast to generate random numbers you don’t really have new data, just some synthetic/simulated data so looping over it is at best recovering your first estimate, but more likely adding more noise and diluting the signal for the parameter values. That may be the source of the increased time for each estimate, but I think the main issue is making clear what you are trying to do with this procedure.

Topic		Replies	Views
Estimation issue in dynamic multivariate stochastic volatility (DMSV) model Modeling cmdstanr	2	125	June 20, 2024
Issue with model and transformed parameters Modeling specification , performance	7	1712	May 23, 2017
Stochastic Volatility Model does not always converge Modeling specification	10	621	September 10, 2020
Stochastic volatility and loo Modeling loo , finance	3	490	October 7, 2021
Out-of-sample Forecast for a stochastic volatility model Modeling techniques , fitting-issues	1	797	December 17, 2018

Is there a problem with looping the estimation?

Related topics