Problem
Hi! I’m new-ish to Stan, and looking for a bit of advice re: some epidemic compartment models I’m working on. There was a recent publication on this topic, but it mainly focused on ODEs. I’m primarily interested in the stochastic formulation of these models (e.g. as continuous time Markov chains). From that publication:
“The flows between compartments can also be simulated stochastically, leading to the same results on average over multiple simulations (ignoring disease extinctions) but with a better handling of uncertainty that comes at a generally higher computational price. This makes stochastic compartmental models more adapted to low-level transmission and small populations.” -Bayesian workflow for disease transmission modeling in Stan
An example of the type of model I’m curious about (SEID) can be found here (reproduced below), with the discrete latent states dealt with by using (roughly) the moment matching approach from this paper to approximate the discrete rvs with continuous distributions (see also the last bit of discussion in this post).
The issue that I’m running into is that this moment-matching approach is tough for me to scale to larger models — even in the simple SEID case, my estimated posterior trajectories for the epidemic look sane, but my model runs into a large number of divergences / (sometimes) tree depth errors. In part, this seems to be because of odd behaviour in the final unobserved latent states, e.g. the exposures / infections values.
An example of this is provided at the bottom of the post.
Ideas?
Ideally I’d like to be able to marginalize out the problematic discrete states, but the dimensionality of the state space is too large for e.g. the forward algorithm to be feasible.
I don’t have any background in filtering etc, but an Idea I’m tossing around in my head is using approximate methods to compute the log-likelihood of the latent states, and then just use HMC over the space of the parameters. Here’s an (unresolved) thread talking about this. As I don’t know anything about the filtering literature, I’m unsure if there’s a reasonable way to do inference on these non-linear non-gaussian models. Particle filters seem like a no-go for reasons discussed in the last link.
As far as reparameterizations go, I tried splitting the \operatorname{Beta}(a,b) random variables into a ratio of two \operatorname{Gamma}(k, 1) distributions — it didn’t seem to help.
Any suggestions — even if they require a fair amount of elbow grease on my end — are super appreciated!
Example model
Here’s an example SEID model:
functions {
real mm_binom_w_beta_lpdf(real y, real n, real p){
real shape = p*n;
real rate = (1-p)*n;
return beta_lpdf(y | shape, rate);
}
}
data {
int<lower=0> numobs;
array[numobs] int obs;
real<lower = 0> N;
}
parameters {
real <lower = 0> dispersion; // Variance
real <lower = 0, upper = 1> sus_frac;
real <lower = 0, upper = 1> inf_frac;
real <lower=0.0001,upper= 3> beta; // Exposure rate
real <lower=0.0001,upper= 100> inc_p; // Infection rate
real <lower=0.0001,upper= 100> inf_p; // Death rate
array[numobs] real<lower=0, upper = 1> exposures;
array[numobs] real<lower=0, upper = 1> infections;
array[numobs] real<lower=0, upper = 1> deaths;
}
model {
vector[numobs + 1] S_full;
vector[numobs + 1] E_full;
vector[numobs + 1] I_full;
vector[numobs + 1] D_full;
real sigma;
real gamma; // can't use name_p because of reserved keywords
real init_sus;
real init_exposures;
real init_infections;
real init_deaths;
// Transforms
sigma = 1/inc_p;
gamma = 1/inf_p;
// Priors on model parameters
beta ~ gamma(2, 0.25);
inc_p ~ gamma(6, 0.5);
inf_p ~ gamma(20, 0.11);
sus_frac ~ beta(2, 2);
inf_frac ~ beta(0.2, 199.8);
dispersion ~ exponential(6);
// Initial Conditions
init_sus = N*sus_frac;
init_exposures = inc_p/inf_p * N * sus_frac * inf_frac;
init_infections = N * sus_frac * inf_frac;
init_deaths = 0;
// Initial Conditions
S_full[1] = init_sus;
E_full[1] = init_exposures;
I_full[1] = init_infections;
D_full[1] = init_deaths;
// Now all observations in the loop
for (t in 1:numobs) {
// keep in mind, the State_Full arrays are shifted by one because of IC.
exposures[t] ~ mm_binom_w_beta(S_full[t], -expm1(-beta*I_full[t] / (S_full[t] + E_full[t] + I_full[t])));
infections[t] ~ mm_binom_w_beta(E_full[t], -expm1(-sigma));
deaths[t] ~ mm_binom_w_beta(I_full[t], -expm1(-gamma));
S_full[t+1] = S_full[t] - S_full[t]*exposures[t];
E_full[t+1] = E_full[t] - E_full[t]*infections[t] + S_full[t]*exposures[t];
I_full[t+1] = I_full[t] + E_full[t]*infections[t] - I_full[t]*deaths[t];
D_full[t+1] = D_full[t] + I_full[t]*deaths[t];
obs[t] ~ neg_binomial_2(I_full[t]*deaths[t], dispersion);
}
}
generated quantities {
// Genererate cumulative states.
vector[numobs] u;
vector[numobs + 1] S;
vector[numobs + 1] E;
vector[numobs + 1] I;
u[1] = deaths[1];
S[1] = N*sus_frac;
E[1] = inc_p/inf_p * N * sus_frac * inf_frac;
I[1] = N * sus_frac * inf_frac;
for(t in 1:numobs){
S[t+1] = S[t] - S[t]*exposures[t];
E[t+1] = E[t] + S[t]*exposures[t] - E[t]*infections[t];
I[t+1] = I[t] + E[t]*infections[t] - I[t]*deaths[t];
u[t] = I[t]*deaths[t];
}
}
With data:
deaths <- c(1, 0, 1, 1, 0, 1, 5, 3, 1, 0, 1, 1, 2, 3, 5, 0, 6, 3, 6, 3, 8, 1, 5, 2,
1, 1, 2, 2, 2, 5, 7, 12, 4, 3, 5, 3, 8, 5, 8, 8, 6, 12, 11, 22, 15, 14,
24, 14, 15, 20, 20, 13, 11, 25, 28, 30, 24, 28, 42, 24, 32, 24, 27, 31,
34, 33, 29, 31, 38, 40, 42, 38, 53, 44, 66, 52, 53, 56, 63, 49, 60, 57,
65, 55, 55, 47, 67, 62, 65, 57, 47, 46, 62, 54, 52, 48, 49, 64, 46, 67,
52, 50, 56, 46, 41, 38, 36, 39, 31, 32, 41, 25, 32, 35, 36, 36, 33, 26,
42, 31, 19, 27, 23, 22, 15, 24, 32, 19, 10, 16, 12, 15, 14, 13, 12, 13,
12, 6, 12, 15, 5, 9, 3, 5, 12, 6, 7, 3, 3, 3, 3, 2, 3, 3, 0, 3, 2, 3,
3, 1, 1, 4, 2, 3, 0, 2, 3, 2, 0, 1, 1, 4, 1, 2, 2, 1, 1, 2, 0, 1, 1 , 2)
data <- list(numobs = 182, obs = deaths, N = 25000)
and cmdstan options:
chains = 6,
parallel_chains = 6,
iter_sampling = 500,
adapt_delta = .995,
max_treedepth = 18,
(This is mortality data from the 1490 plague in Barcelona). We can plot the posterior latent death trajectories:
Exposures and infections:
These look relatively sane. Problems start to show up when we look at the diagnostics:
714/3000 transitions divergent
Pairs plots for model parameters:
Pairs plots for some chosen transitions:
Note:
Running on simulated data seems to indicate that the final latent exposed transitions are always problematic, even if we only observe the first ~half of the epidemic:
A particularly weird example on simulated data of length ~45
.