Translating random walk to gaussian process (and pymc3 model to stan)

I am working with Kevin Systrom on his rt.live project. You can see the jupyter notebook for the current model here.

In stan, the (simplified) part of the model that I’m having trouble with looks like this:

parameters {
  real<lower=0> step_size;
  matrix[timesteps, states] theta_steps;
} 

transformed parameters {
  matrix[timesteps, states] theta;

  for(s in 1:states) {
    theta[, s] = cumulative_sum(theta_steps[, s]);
  }
}

model {
  step_size ~ normal(0, 1)T[0, ];
  theta_steps[1, ] ~ normal(0, 1);
  to_vector(theta_steps[2:timesteps, ]) ~ normal(0, step_size);

  for(s in 1:states) {
    for(t in 1:timesteps)
      cases[t, s] ~ poisson(cases[t-1, s] * exp(theta[,s]));
    }
  }
}

As you can see, theta follows a random walk. This seems to work pretty well except that we are getting low N_{eff}, not-great traceplots, and correlation between it and energy__ for the step_size parameter, especially when we model one state at a time. Including many states (especially those with more data), it seems to fit okay. Making the parameter hierarchical did not seem to help much.

We are experimenting with specifying the evolution of theta as a gaussian process vs. a random walk. You can see how it’s done in pymc3 here: with the relevant code below (cell run 629):

def run_gp(self):
        with pm.Model() as model:
            gp_shape = len(self.onset) - 1

            length_scale = pm.Gamma("length_scale", alpha=3, beta=.4)

            eta = .05
            cov_func = eta**2 * pm.gp.cov.ExpQuad(1, length_scale)

            gp = pm.gp.Latent(mean_func=pm.gp.mean.Constant(c=0), 
                              cov_func=cov_func)

            # Place a GP prior over the function f.
            theta = gp.prior("theta", X=np.arange(gp_shape)[:, None])

This specification seems to produce results that overlap nicely with our random-walk estimates, but without the fitting issues (black is random walk, red is GP):

I am not as familiar with gaussian processes, and have been unable to duplicate this result in stan. My naive attempt at following the documentation for a latent variable GP runs with severe complaints (divergences, low energy, treedepth), and also produces totally different results:

image

The code is here:

// https://mc-stan.org/docs/2_23/stan-users-guide/fit-gp-section.html

data {

  int timesteps; 
  int states;
  int cases[timesteps, states]; // diffed & convolved // eventually move this to xformed parameters
  vector[timesteps] cum_p_observed; // make this estimated given timing block
  
} 

transformed data {
  int N = timesteps-1;
  real x[N];
  real delta = 1e-9;
  
  
  for(i in 1:N) x[i] = 1;
}

parameters {
  matrix[timesteps-1, states] theta;
  real<lower=0> serial_interval;
  
  real<lower=0> rho;
  real<lower=0> alpha;
  real<lower=0> sigma;
  vector[N] eta;
  
  
}

transformed parameters {
  real<lower=0> gam;
  matrix[timesteps-1, states] rt;
  
  gam = 1/serial_interval;
  rt = theta/gam + 1;
}

model {
  
  vector[N] f;
  {
    matrix[N, N] L_K;
    matrix[N, N] K = cov_exp_quad(x, alpha, rho);

    // diagonal elements
    for (n in 1:N)
      K[n, n] = K[n, n] + delta;

    L_K = cholesky_decompose(K);
    f = L_K * eta;
  }

  rho ~ inv_gamma(5, 5);
  alpha ~ std_normal();
  sigma ~ std_normal();
  eta ~ std_normal();
  
  for(s in 1:states) theta[, s] ~ normal(f, sigma);
  
  serial_interval ~ gamma(6, 1.5);
  
  for(t in 2:timesteps) {
    for(s in 1:states) {
      real muhat;
      // we scale up yesterday's cases by the probability that we have observed it
      real inferred_cases_yesterday = (cases[t-1, s]/cum_p_observed[t-1]);
      // we scale down today's cases by the fact that we have not observed many of the later cases
      real expected_cases_today = cum_p_observed[t] * inferred_cases_yesterday * exp(theta[t-1, s]);
      // here guard against log(0) errors with init case
      // this 0.1 can be thought of as a seeding probability
      muhat = fmax(expected_cases_today, 0.1); 
      cases[t, s] ~ poisson(muhat);
    } 
  }
  
}

My questions are a-few-fold:

  1. For the random-walk specification, any tips-of-the-trade to improve the specification of the parameter? Intuition on why it would be hard to sample? I have tried decreasing the scale of the prior, letting the parameter follow a standard normal and exponentiating it, making it hierarchical across states, and it’s still having trouble every time.
  2. For the GP, I am most interested in simply replicating the PyMC3 approach to start with; I understand this may involve a different specification and/or different priors on the hyperparameters.

Any help would be highly appreciated!

3 Likes

I’m on it

2 Likes

Hi -

Any simple Gaussian random walk recovers parameters fine. However, I’ve dumbed down the poisson model you had above, model block looking something like this:

model {
  theta ~ cauchy(0, 1);

  for (t in 2:Time) {
    x[t] ~ poisson(x[t-1] * exp(theta));
  }
}

and tried to simulate data just to see if I can recover parameters and I’m getting numerical underflow/overflow every time, for the generated x. Were you able to ever simulate data and recover parameters?

For the GP, it’d take specifying exactly how you want to generate your covariance matrix.

My first “guess” was to simulate a partial sine wave, and include as a parameter in the random walk, but this is a poorly specified model, in that I’m not able to easily recover f:

data {
  int<lower=0> Time;
  vector[Time] y;
  real x[Time]; // 1,2,3,4,....,Time
}
transformed parameters {
    // make f
}
model {
  for (t in 2:Time) {
    y[t] ~ normal(f[t-1] * y[t-1], sigma);
  }
}

There’s a lot of things that can go wrong. First is recovering parameters for the poisson model you have above, then figuring out how to exactly specify the GP.

I can’t investigate too much further right now. It would take slowly building up each part of the model.

I’m also not really fluent in PyMC, so I can only guess as to what they’re doing, without really digging in.

1 Like

Poisons are bad for this model. The variance grows like T (or maybe T^2 - you’d have to check) so you rapidly hit overflow issues unless this scaled. (Usually make the variance know how long the random walk is and scale appropriately so if all stays unit scale)

1 Like

Simulating is also a challenge because there’s no constraint on x[1]. So the model isn’t proper. Pymc3 have an undocumented fix (which is bad) of setting x[1]=0 when simulating.

1 Like