I am working with Kevin Systrom on his rt.live project. You can see the jupyter notebook for the current model here.
In stan, the (simplified) part of the model that I’m having trouble with looks like this:
parameters {
real<lower=0> step_size;
matrix[timesteps, states] theta_steps;
}
transformed parameters {
matrix[timesteps, states] theta;
for(s in 1:states) {
theta[, s] = cumulative_sum(theta_steps[, s]);
}
}
model {
step_size ~ normal(0, 1)T[0, ];
theta_steps[1, ] ~ normal(0, 1);
to_vector(theta_steps[2:timesteps, ]) ~ normal(0, step_size);
for(s in 1:states) {
for(t in 1:timesteps)
cases[t, s] ~ poisson(cases[t-1, s] * exp(theta[,s]));
}
}
}
As you can see, theta
follows a random walk. This seems to work pretty well except that we are getting low N_{eff}, not-great traceplots, and correlation between it and energy__
for the step_size
parameter, especially when we model one state at a time. Including many states (especially those with more data), it seems to fit okay. Making the parameter hierarchical did not seem to help much.
We are experimenting with specifying the evolution of theta
as a gaussian process vs. a random walk. You can see how it’s done in pymc3
here: with the relevant code below (cell run 629):
def run_gp(self):
with pm.Model() as model:
gp_shape = len(self.onset) - 1
length_scale = pm.Gamma("length_scale", alpha=3, beta=.4)
eta = .05
cov_func = eta**2 * pm.gp.cov.ExpQuad(1, length_scale)
gp = pm.gp.Latent(mean_func=pm.gp.mean.Constant(c=0),
cov_func=cov_func)
# Place a GP prior over the function f.
theta = gp.prior("theta", X=np.arange(gp_shape)[:, None])
This specification seems to produce results that overlap nicely with our random-walk estimates, but without the fitting issues (black is random walk, red is GP):
I am not as familiar with gaussian processes, and have been unable to duplicate this result in stan. My naive attempt at following the documentation for a latent variable GP runs with severe complaints (divergences, low energy, treedepth), and also produces totally different results:
The code is here:
// https://mc-stan.org/docs/2_23/stan-users-guide/fit-gp-section.html
data {
int timesteps;
int states;
int cases[timesteps, states]; // diffed & convolved // eventually move this to xformed parameters
vector[timesteps] cum_p_observed; // make this estimated given timing block
}
transformed data {
int N = timesteps-1;
real x[N];
real delta = 1e-9;
for(i in 1:N) x[i] = 1;
}
parameters {
matrix[timesteps-1, states] theta;
real<lower=0> serial_interval;
real<lower=0> rho;
real<lower=0> alpha;
real<lower=0> sigma;
vector[N] eta;
}
transformed parameters {
real<lower=0> gam;
matrix[timesteps-1, states] rt;
gam = 1/serial_interval;
rt = theta/gam + 1;
}
model {
vector[N] f;
{
matrix[N, N] L_K;
matrix[N, N] K = cov_exp_quad(x, alpha, rho);
// diagonal elements
for (n in 1:N)
K[n, n] = K[n, n] + delta;
L_K = cholesky_decompose(K);
f = L_K * eta;
}
rho ~ inv_gamma(5, 5);
alpha ~ std_normal();
sigma ~ std_normal();
eta ~ std_normal();
for(s in 1:states) theta[, s] ~ normal(f, sigma);
serial_interval ~ gamma(6, 1.5);
for(t in 2:timesteps) {
for(s in 1:states) {
real muhat;
// we scale up yesterday's cases by the probability that we have observed it
real inferred_cases_yesterday = (cases[t-1, s]/cum_p_observed[t-1]);
// we scale down today's cases by the fact that we have not observed many of the later cases
real expected_cases_today = cum_p_observed[t] * inferred_cases_yesterday * exp(theta[t-1, s]);
// here guard against log(0) errors with init case
// this 0.1 can be thought of as a seeding probability
muhat = fmax(expected_cases_today, 0.1);
cases[t, s] ~ poisson(muhat);
}
}
}
My questions are a-few-fold:
- For the random-walk specification, any tips-of-the-trade to improve the specification of the parameter? Intuition on why it would be hard to sample? I have tried decreasing the scale of the prior, letting the parameter follow a standard normal and exponentiating it, making it hierarchical across states, and it’s still having trouble every time.
- For the GP, I am most interested in simply replicating the PyMC3 approach to start with; I understand this may involve a different specification and/or different priors on the hyperparameters.
Any help would be highly appreciated!