Issues fitting markov models for epidemics

Problem

Hi! I’m new-ish to Stan, and looking for a bit of advice re: some epidemic compartment models I’m working on. There was a recent publication on this topic, but it mainly focused on ODEs. I’m primarily interested in the stochastic formulation of these models (e.g. as continuous time Markov chains). From that publication:

“The flows between compartments can also be simulated stochastically, leading to the same results on average over multiple simulations (ignoring disease extinctions) but with a better handling of uncertainty that comes at a generally higher computational price. This makes stochastic compartmental models more adapted to low-level transmission and small populations.” -Bayesian workflow for disease transmission modeling in Stan

An example of the type of model I’m curious about (SEID) can be found here (reproduced below), with the discrete latent states dealt with by using (roughly) the moment matching approach from this paper to approximate the discrete rvs with continuous distributions (see also the last bit of discussion in this post).

The issue that I’m running into is that this moment-matching approach is tough for me to scale to larger models — even in the simple SEID case, my estimated posterior trajectories for the epidemic look sane, but my model runs into a large number of divergences / (sometimes) tree depth errors. In part, this seems to be because of odd behaviour in the final unobserved latent states, e.g. the exposures / infections values.

An example of this is provided at the bottom of the post.


Ideas?

Ideally I’d like to be able to marginalize out the problematic discrete states, but the dimensionality of the state space is too large for e.g. the forward algorithm to be feasible.

I don’t have any background in filtering etc, but an Idea I’m tossing around in my head is using approximate methods to compute the log-likelihood of the latent states, and then just use HMC over the space of the parameters. Here’s an (unresolved) thread talking about this. As I don’t know anything about the filtering literature, I’m unsure if there’s a reasonable way to do inference on these non-linear non-gaussian models. Particle filters seem like a no-go for reasons discussed in the last link.

As far as reparameterizations go, I tried splitting the \operatorname{Beta}(a,b) random variables into a ratio of two \operatorname{Gamma}(k, 1) distributions — it didn’t seem to help.

Any suggestions — even if they require a fair amount of elbow grease on my end — are super appreciated!


Example model

Here’s an example SEID model:

functions {    
    real mm_binom_w_beta_lpdf(real y, real n, real p){
        real shape = p*n;
        real rate = (1-p)*n;
        
        return beta_lpdf(y | shape, rate);
    }
}
data {
    int<lower=0> numobs; 
    array[numobs] int obs; 
    real<lower = 0> N;
}
parameters {
    real <lower = 0> dispersion; // Variance
    real <lower = 0, upper = 1> sus_frac;
    real <lower = 0, upper = 1> inf_frac;

    real <lower=0.0001,upper= 3> beta; // Exposure rate
    real <lower=0.0001,upper= 100> inc_p; // Infection rate
    real <lower=0.0001,upper= 100> inf_p; // Death rate
    
    array[numobs] real<lower=0, upper = 1> exposures; 
    array[numobs] real<lower=0, upper = 1> infections; 
    array[numobs] real<lower=0, upper = 1> deaths; 
}
model {
    vector[numobs + 1] S_full; 
    vector[numobs + 1] E_full;
    vector[numobs + 1] I_full; 
    vector[numobs + 1] D_full;
    
    real sigma; 
    real gamma; // can't use name_p because of reserved keywords
    real init_sus;
    real init_exposures;
    real init_infections;
    real init_deaths;
    
    
    // Transforms
    sigma =  1/inc_p;
    gamma =  1/inf_p;
    
    
    // Priors on model parameters 
    beta ~ gamma(2, 0.25);
    inc_p ~ gamma(6, 0.5);
    inf_p ~ gamma(20, 0.11);
    
    sus_frac ~ beta(2, 2);
    inf_frac ~ beta(0.2, 199.8);
    
    dispersion ~ exponential(6);

    
    // Initial Conditions
    init_sus = N*sus_frac;
    init_exposures = inc_p/inf_p * N * sus_frac * inf_frac;
    init_infections = N * sus_frac * inf_frac;
    init_deaths = 0;
    
    // Initial Conditions
    S_full[1] = init_sus;
    E_full[1] = init_exposures;

    I_full[1] = init_infections;
    D_full[1] = init_deaths;

    // Now all observations in the loop
    for (t in 1:numobs) { 
      
        // keep in mind, the State_Full arrays are shifted by one because of IC. 
        exposures[t] ~ mm_binom_w_beta(S_full[t], -expm1(-beta*I_full[t] / (S_full[t] + E_full[t] + I_full[t])));
        infections[t] ~ mm_binom_w_beta(E_full[t], -expm1(-sigma));
        deaths[t] ~ mm_binom_w_beta(I_full[t], -expm1(-gamma));


        S_full[t+1] = S_full[t] - S_full[t]*exposures[t];
        E_full[t+1] = E_full[t] - E_full[t]*infections[t] + S_full[t]*exposures[t];
        I_full[t+1] = I_full[t] + E_full[t]*infections[t] - I_full[t]*deaths[t];
        D_full[t+1] = D_full[t] + I_full[t]*deaths[t];

        obs[t] ~ neg_binomial_2(I_full[t]*deaths[t], dispersion);
    }
}
generated quantities {
   // Genererate cumulative states.
  
   vector[numobs] u;
   vector[numobs + 1] S;
   vector[numobs + 1] E;
   vector[numobs + 1] I;

   u[1] = deaths[1];
   S[1] = N*sus_frac;
   E[1] = inc_p/inf_p * N * sus_frac * inf_frac;
   I[1] = N * sus_frac * inf_frac;

   for(t in 1:numobs){
       S[t+1] = S[t] - S[t]*exposures[t];
       E[t+1] = E[t] + S[t]*exposures[t] - E[t]*infections[t];
       I[t+1] = I[t] + E[t]*infections[t] - I[t]*deaths[t];
       u[t] = I[t]*deaths[t];
   }
}

With data:

deaths <- c(1, 0, 1, 1, 0, 1, 5, 3, 1, 0, 1, 1, 2, 3, 5, 0, 6, 3, 6, 3, 8, 1, 5, 2, 
            1, 1, 2, 2, 2, 5, 7, 12, 4, 3, 5, 3, 8, 5, 8, 8, 6, 12, 11, 22, 15, 14,
            24, 14, 15, 20, 20, 13, 11, 25, 28, 30, 24, 28, 42, 24, 32, 24, 27, 31,
            34, 33, 29, 31, 38, 40, 42, 38, 53, 44, 66, 52, 53, 56, 63, 49, 60, 57,
            65, 55, 55, 47, 67, 62, 65, 57, 47, 46, 62, 54, 52, 48, 49, 64, 46, 67,
            52, 50, 56, 46, 41, 38, 36, 39, 31, 32, 41, 25, 32, 35, 36, 36, 33, 26,
            42, 31, 19, 27, 23, 22, 15, 24, 32, 19, 10, 16, 12, 15, 14, 13, 12, 13,
            12, 6, 12, 15, 5, 9, 3, 5, 12, 6, 7, 3, 3, 3, 3, 2, 3, 3, 0, 3, 2, 3,
            3, 1, 1, 4, 2, 3, 0, 2, 3, 2, 0, 1, 1, 4, 1, 2, 2, 1, 1, 2, 0, 1, 1 , 2)


data <- list(numobs = 182, obs = deaths, N = 25000)

and cmdstan options:

chains = 6,
parallel_chains = 6,
iter_sampling = 500,
adapt_delta = .995,
max_treedepth = 18,

(This is mortality data from the 1490 plague in Barcelona). We can plot the posterior latent death trajectories:

Exposures and infections:

These look relatively sane. Problems start to show up when we look at the diagnostics:

714/3000 transitions divergent

Pairs plots for model parameters:

Pairs plots for some chosen transitions:


Note:

Running on simulated data seems to indicate that the final latent exposed transitions are always problematic, even if we only observe the first ~half of the epidemic:

A particularly weird example on simulated data of length ~45.

1 Like

That was me, and so far I never got to actually trying to implement a filtering algorithm in Stan for this kind of dynamic model (I did a very crude version in pure Python, but the sampler was MH so it was not problematic and performance was probably very poor).
I believe even a general particle filter is possible in principle with HMC, but that would require being able to generate random numbers within a sampling step and some additional tricks, so that would take quite a bit of work from someone who really needs it.

Deterministic filtering algorithms (in that they don’t involve any RNGs) are feasible, there are some examples of Kalman Filters and maybe Extended Kalman Filter, going further to implement an Unscented Kalman Filter should be a relatively straightforward extension. My hunch is that not only will the method be able to account for stochastic trajectories, but it could improve performance by trading off the filtering computations with the discrete time solution and more efficient sampling overall.

Implementing, testing, and making sure it not only feasible but practical will take some work – devil is the details, but I think it could make considerable progress in fitting this kind of model, so if you are willing to give it a try I definitely encourage you and would be happy to help.

2 Likes

My hunch is that not only will the method be able to account for stochastic trajectories, but it could improve performance by trading off the filtering computations with the discrete time solution and more efficient sampling overall.

Yeah, I’ve tried some simple tests for linear gaussian models (with a classic Kalman filter), and the results are surprisingly good, so I’m pretty optimistic.

going further to implement an Unscented Kalman Filter should be a relatively straightforward extension

Is this good enough for models with binomial transitions? It looks like UKF still relies on propagating (specifically) a gaussian rv, just with better handling of nonlinearities in the model.

If so, do you know any examples of an unscented KF for an SIR style compartment model (just calculating states_i \mid obs_i, \theta)? I think it’d be pretty quick to get a working estimator for states + parameters from there (and then the issue is just validation).

As far as I remember, the difference between the EKF and UKF us that the first will linearize the model and at that point (being equivalent to the original KF) perform the filtering step with gaussians, while the UKF will propagate a set of “sigma points” and only then approximate it with a gaussian. So the EKF does not relax the normality assumption, the UKF does; I’m not sure if it’s done in practice, but since this is just an empirical distribution approximated by a set of points it could be any distribution. Whether the actual performance (in terms of results, not speed, of course) is comparable to a general particle filter I’m not sure, but it’s something that can be assessed in a non-HMC setting with simpler SIR models.

I don’t know of an actual example of UKF with SIR models, but I haven’t looked into UKF implementations at all; they are designed for precisely this kind of problem, so whatever you can find should be informative.

I agree. Implementing a basic version shouldn’t be complicated, but scaling it up to a real-world research problem and achieving reasonable performance may take more work and hit some roadblocks here or there. Or maybe not, and the additional hurdles of implementing the filter will trade off with performance and just work.