Hi everyone,

I am working on a method for fitting stochastic epidemic models to partially observed infection counts (incidence data) and, being relatively new to Stan, was hoping that some of the more experienced users might be able to suggest ways to optimize the implementation for speed. A bit of background about the problem and then the proposed method: Stochastic epidemic models are classic epidemiological tools that are used to model the time-evolution of an epidemic as individuals transition through different disease states. Often, the data will consist of some fraction of new infections accumulated in each inter-observation interval. This type of data, referred to as incidence data, is commonly encountered in disease surveillance settings. In large populations, it is common practice to represent the latent epidemic process using a deterministic system of ordinary differential equations. However, these deterministic representations ignore the stochastic aspects of the epidemic and can lead to incorrect inferences.

We are working on adapting a method called the linear noise approximation for approximate inference for stochastic epidemic models fit to incidence data. At its essence, this method approximates the transition density for the Markov jump process representation of a stochastic epidemic model with a Gaussian state space model where the moments of the transition densities are obtained by solving systems of non-linear, non-autonomous ordinary differential equations. I have included my code below for using this to fit an SIR model (a canonical stochastic epidemic model for a single, homogeneously mixing, closed population) to negative-binomial incidence counts. This model takes on the order of hours to run, but ultimately we are targeting more complex model dynamics to reflect spatial structure and population strata. In such models the since of the latent state space, represented in the code below by N_raw, grows polynomially in the number of model compartments. Thus, run times easily extend to something on the order of weeks.

In addition to the long run times, there are also a few numerical problems that are indicated in comments in the Stan code. One is that the numerically stable versions of some log differences of exponentials cause the code to crash. Another is that changing the integration method from bdf to rk45 causes issues. In more complex models, initializing the system has also proven to be quite problematic.

Any suggestions for how to optimize the code would be greatly appreciated!

Thank you!

Jon

lna_SIR_example.R (10.3 KB)