Map_rect implementation for time-series models

Hi,

I’m having trouble implementing the “mapped” version of time-series models using the map_rect function.
Let me take the stochastic volatility model from the user’s guide for example:

// Unmapped original version:
data {
  int<lower=0> T;   // # time points (equally spaced)
  vector[T] y;      // mean corrected return at time t
}
parameters {
  real mu;                     // mean log volatility
  real<lower=-1,upper=1> phi;  // persistence of volatility
  real<lower=0> sigma;         // white noise shock scale
  vector[T] h;                 // log volatility at time t
}
model {
  phi ~ uniform(-1, 1);
  sigma ~ cauchy(0, 5);
  mu ~ cauchy(0, 10);
  h[1] ~ normal(mu, sigma / sqrt(1 - phi * phi));
  for (t in 2:T)
    h[t] ~ normal(mu + phi * (h[t - 1] - mu), sigma);
  for (t in 1:T)
    y[t] ~ normal(0, exp(h[t] / 2));
}

In order to convert the above model into the mapped version, let’s say I define

functions {
  vector lp_reduce(vector xi, vector thetas, real[] y_s, int[] x_is) {
...
  }
}

and

data {
...
  int<lower=0> N_shards;
}
transformed data {
  int<lower=0> J = T / N_shards;
  real y_s[N_shards, J];
  int x_is[N_shards, J];
  {
    int pos = 1;
    for (n in 1:N_shards) {
      int end = pos + J - 1;
      y_s[n] = to_array_1d(y[pos:end]);
      pos += J;
    }
  }
}

What would be the appropriate way to pass the time-varying h[1:T] into the “xi” in the lp_reduce function for map_rect and implement the AR(1) h[t] process part in the lp_reduce function?
Or isn’t it a good idea to use map_rect for time-series models?
I appreciate any hints or examples.

Thanks,

Basically, map_rect is geared toward conditionally independent shards. You could possibly get this SV to work if you had something like

model {
  vector[0] xi;
  vector[J] thetas[N_shards];
  int pos = 1;
  for (n in 1:N_shards) {
    int end = pos + J - 1;
    thetas[n] = h[pos:end];
    pos += J;
  }

  target += sum(map_rect(lp_reduce, xi, thetas, y_s, x_is));
}

Also, the stochastic volatility model given in the Stan User Manual is a computational catastrophe and you should go off the improved version described at the end of that subsection.

Thank you very much for the reply.

So, I figure that, for time-series models, basically I can only parallelize the part that does not involve intertemporal dependencies such as between h[t] and h[t-1].
I have rewritten the code as below including the improvement described in the manual.

functions {
  vector lp_reduce(vector xi, vector thetas, real[] y_s, int[] x_is) {
    real lp = normal_lpdf(y_s | 0, exp(thetas / 2));
    return [lp]';
  }
}
data {
  int<lower=0> T;   // # time points (equally spaced)
  vector[T] y;      // mean corrected return at time t
  int<lower=0> N_shards; // # shards
}
transformed data {
  int<lower=0> J = T / N_shards;
  real y_s[N_shards, J];
  int x_is[N_shards, J];
  {
    int pos = 1;
    for (n in 1:N_shards) {
      int end = pos + J - 1;
      y_s[n] = to_array_1d(y[pos:end]);
      pos += J;
    }
  }
}
parameters {
  real mu;                     // mean log volatility
  real<lower=-1,upper=1> phi;  // persistence of volatility
  real<lower=0> sigma;         // white noise shock scale
  vector[T] h_std; // std log volatility time t
}
transformed parameters {
  vector[T] h = h_std * sigma;  // now h ~ normal(0, sigma)
  h[1] /= sqrt(1 - phi * phi);  // rescale h[1]
  h += mu;
  for (t in 2:T)
    h[t] += phi * (h[t-1] - mu);
}
model {
  vector[0] xi;
  vector[J] thetas[N_shards];
  int pos = 1;
  phi ~ uniform(-1, 1);
  sigma ~ cauchy(0, 5);
  mu ~ cauchy(0, 10);
  h_std ~ std_normal();
  for (n in 1:N_shards) {
    int end = pos + J - 1;
    thetas[n] = h[pos:end];
    pos += J;
  }
  target += sum(map_rect(lp_reduce, xi, thetas, y_s, x_is));
}

In this case, I suppose there would be no benefits to pass mu, phi and sigma into map_rect as “xi” parameter.