I’m having trouble implementing the “mapped” version of time-series models using the map_rect function.
Let me take the stochastic volatility model from the user’s guide for example:
// Unmapped original version:
data {
int<lower=0> T; // # time points (equally spaced)
vector[T] y; // mean corrected return at time t
}
parameters {
real mu; // mean log volatility
real<lower=-1,upper=1> phi; // persistence of volatility
real<lower=0> sigma; // white noise shock scale
vector[T] h; // log volatility at time t
}
model {
phi ~ uniform(-1, 1);
sigma ~ cauchy(0, 5);
mu ~ cauchy(0, 10);
h[1] ~ normal(mu, sigma / sqrt(1 - phi * phi));
for (t in 2:T)
h[t] ~ normal(mu + phi * (h[t - 1] - mu), sigma);
for (t in 1:T)
y[t] ~ normal(0, exp(h[t] / 2));
}
In order to convert the above model into the mapped version, let’s say I define
data {
...
int<lower=0> N_shards;
}
transformed data {
int<lower=0> J = T / N_shards;
real y_s[N_shards, J];
int x_is[N_shards, J];
{
int pos = 1;
for (n in 1:N_shards) {
int end = pos + J - 1;
y_s[n] = to_array_1d(y[pos:end]);
pos += J;
}
}
}
What would be the appropriate way to pass the time-varying h[1:T] into the “xi” in the lp_reduce function for map_rect and implement the AR(1) h[t] process part in the lp_reduce function?
Or isn’t it a good idea to use map_rect for time-series models?
I appreciate any hints or examples.
Basically, map_rect is geared toward conditionally independent shards. You could possibly get this SV to work if you had something like
model {
vector[0] xi;
vector[J] thetas[N_shards];
int pos = 1;
for (n in 1:N_shards) {
int end = pos + J - 1;
thetas[n] = h[pos:end];
pos += J;
}
target += sum(map_rect(lp_reduce, xi, thetas, y_s, x_is));
}
Also, the stochastic volatility model given in the Stan User Manual is a computational catastrophe and you should go off the improved version described at the end of that subsection.
So, I figure that, for time-series models, basically I can only parallelize the part that does not involve intertemporal dependencies such as between h[t] and h[t-1].
I have rewritten the code as below including the improvement described in the manual.
functions {
vector lp_reduce(vector xi, vector thetas, real[] y_s, int[] x_is) {
real lp = normal_lpdf(y_s | 0, exp(thetas / 2));
return [lp]';
}
}
data {
int<lower=0> T; // # time points (equally spaced)
vector[T] y; // mean corrected return at time t
int<lower=0> N_shards; // # shards
}
transformed data {
int<lower=0> J = T / N_shards;
real y_s[N_shards, J];
int x_is[N_shards, J];
{
int pos = 1;
for (n in 1:N_shards) {
int end = pos + J - 1;
y_s[n] = to_array_1d(y[pos:end]);
pos += J;
}
}
}
parameters {
real mu; // mean log volatility
real<lower=-1,upper=1> phi; // persistence of volatility
real<lower=0> sigma; // white noise shock scale
vector[T] h_std; // std log volatility time t
}
transformed parameters {
vector[T] h = h_std * sigma; // now h ~ normal(0, sigma)
h[1] /= sqrt(1 - phi * phi); // rescale h[1]
h += mu;
for (t in 2:T)
h[t] += phi * (h[t-1] - mu);
}
model {
vector[0] xi;
vector[J] thetas[N_shards];
int pos = 1;
phi ~ uniform(-1, 1);
sigma ~ cauchy(0, 5);
mu ~ cauchy(0, 10);
h_std ~ std_normal();
for (n in 1:N_shards) {
int end = pos + J - 1;
thetas[n] = h[pos:end];
pos += J;
}
target += sum(map_rect(lp_reduce, xi, thetas, y_s, x_is));
}
In this case, I suppose there would be no benefits to pass mu, phi and sigma into map_rect as “xi” parameter.