Hi Sebastian,
A couple of quick questions and then some mathematical description of the model:
-
I created a few variables in the beginning of the lna_fcn that is integrated and in the get_lna function of avoiding duplicate computation (e.g., only computing exp(Z) once and then reusing it). Is it possible that in creating these variables, I am inadvertently bogging down Stan by causing it to generate additional sensitivity equations? Should I focus on eliminating variables, even if it means duplicating computation, if it will reduce the number of variables that go into the gradient calculations?
-
Could you explain a bit further which sensitivity equations get generated and how they get used (or don’t get used if there are merely intermediate variables that are generated)?
Some mathematical description (apologies for the math not rendering, I’m not quite sure how to fix that but I’ve attached a pdf with the writeup below):
Stochastic epidemic models (SEMs) describe the transmission dynamics of infectious diseases. These models are often represented as density dependent Markov jump processes (MJPs), which have systems of ordinary differential equations as functional limits (as the population size tends to infinity). The canonical example of a SEM is the susceptible-infected-recovered (SIR) model for a closed and homogeneously mixing population. This model, presented above, is so called because an epidemic is represented in terms of the disease histories of individuals as they stochastically transition between susceptible, infected, and recovered states (model compartments). In this model, per-contact infectivity rate is given by \beta, so with S susceptibles and I infecteds the rate at which a new infection occurs is \beta S I. Similarly, each infected (and hence infectious) individual recovers with rate \mu, and thus the rate at which a recovery occurs is given by \mu I. A loose explanation for what we mean by density dependent in characterizing the MJP is that the dynamics of the process are driven by the fraction of the population in each compartment (i.e., we can equivalently represent the model in terms of concentrations and maintain the dynamics).
Let X=(S,I,R) be the vector of SIR compartment counts, N=(N_{SI},N_{IR} be the counting process for the cumulative numbers of infection and recovery events, and \phi(t,X)=(\beta S I, \mu I) be the vector of infection and recovery rates for the SIR model. It will be necessary in working with incidence data to reparameterize the compartment counts, and hence the vector of rates, \phi, in terms of the counting process with respect to the initial compartment counts X_0=(S_0, I_0, R_0). Let $$A = \left(\begin{array}{ccc} -1 & 1 & 0 \ 0 & -1 & 1 \end{array}\right)$$ and note that we can express the compartment concentrations at time t in terms of the initial compartment counts and the cumulative numbers of infections and recoveries using the relation, X(t)=X(0) + A^\prime N(t). Thus, we can write the rates, \phi, as
$$\phi(t,N) = \left(\begin{array}{c}\beta (S_0 - N_{SI})(I_0 + N_{SI} - N_{IR}) \
\mu(I_0 + N_{SI} - N_{IR})\end{array}\right).$$
In the infinite population limit, the counting process for the numbers of infections and recoveries evolves according to the deterministic path obtained by integrating the following system of ODEs:
$$\frac{\mathrm{d}N(t)}{\mathrm{d}t} = \phi(t, N),\hspace{0.2in} N_0 = (0,0),\ X_0 = (S_0, I_0, R_0).$$
Clearly, the dynamics of the limiting ODE do not reflect the stochastic aspects of the time-evolution of the MJP. However, in the large population limit, it is possible to approximate the transition density of the MJP with the following Ito diffusion:
$$\mathrm{d}N(t) = \phi(t,N)\mathrm{d}t + \sqrt{\Phi(t,N)}\mathrm{d}W(t),\ s.t., N(0)=0,\ X_0=(S_0, I_0, R_0). $$
where \Phi(t,N)=\mathrm{diag}(\phi(t,N)), and W(t) is a 2-dimensional (as we have two elementary transition events, infections and recoveries) standard Brownian motion.
Since the counting process N is positive and monotonically increasing, it will be advantageous to work with the log-transformed SDE for the log-transformed process, Z = (Z_{SI},Z_{IR}) = (\log(N_{SI}),\log(N_{IR})). However, we should proceed with caution as there is an obvious problem with the initial condition for N_0, as \log(0)=-\infty. We can resolve this by setting N_0=(1,1) and “borrowing” events from X_0 in the following way: Note that X(t)=X(0) + A^\prime N(t) = X(0) + A^\prime(N(t)-\mathbf{1} + \mathbf{1}) = (X(0)-A^\prime\mathbf{1}) + A^\prime(N(t)+\mathbf{1}) for all t. Thus, an equivalent set of initial conditions for the SDE is given by N_0=(1,1),\ X_0=(S_0,I_0,R_0)-A^\prime\mathbf{1}\equiv\widetilde{X}_0, as long as \widetilde{X}_0 \geq 0. In our context, this is not an unreasonable condition as it implies, for example, that there are not more infection events than there are initial susceptibles.
By Ito’s lemma the SDE for the log-transformed process, Z, subject to initial conditions Z_0=(0,0),\ X_0=\widetilde{X}_0, is given by
$$\mathrm{d}Z(t) = \left[\mathrm{diag}\left(\exp(-Z(t))\right) - \frac{1}{2}\mathrm{diag}\left(\exp(-2Z(t))\right)\right]\phi(t,\exp(Z(t))) + \mathrm{diag}\left(\exp(-Z(t))\right)\sqrt{\Phi(t,\exp(Z(t)))}\mathrm{d}W(t).$$
The LNA is derived by Taylor expanding the drift and diffusion terms in the above SDE about the deterministic limit of the MJP. The resulting approximation decomposes the Ito SDE into a non-linear deterministic drift and a linear SDE for the fluctuations about the drift. This linear SDE has an analytic solution that is given by a multivariate Gaussian, the moments of which are obtained by solving a coupled, non-autonomous system of ODEs. The LNA transition density over the interval [t_{\ell-1}, t_\ell] thus takes the form, $$Z(t_\ell)|Z(t_{\ell-1}) = \boldsymbol{\eta}(t_\ell) + \mathbf{m}\left (Z(t_{\ell-1}) - \boldsymbol{\eta}(t_{\ell-1})\right ) + \boldsymbol{\epsilon}\ell,\hspace{0.25cm} \boldsymbol{\epsilon}\ell \sim N(0,\boldsymbol{\Sigma}(t_\ell)),$$
where
-
\boldsymbol{\eta}(t_\ell) is the solution to \frac{\mathrm{d}\boldsymbol{\eta}(t)}{\mathrm{d}t} = \left [\mathrm{diag}\left (\mathrm{e}^{-\boldsymbol{\eta}(t)}\right ) - \frac{1}{2}\mathrm{diag}\left (\mathrm{e}^{-2\boldsymbol{\eta}(t)}\right )\right]\boldsymbol{\Phi}\left (\mathrm{e}^{\boldsymbol{\eta}(t)}\right ) \equiv\xi(\boldsymbol{\eta}(t)).
-
\mathbf{m}(Z(t_{\ell-1}) - \boldsymbol{\eta}(t_{\ell-1})) is the solution to \frac{\mathrm{d} \mathbf{m}(t)}{\mathrm{d}t} = \mathbf{F}(t)\mathbf{m}(t), where \mathbf{F}(t) = \left (\frac{\partial\xi(Z(t))}{\partial\mathbf{Z}(t)}\right )\Big|_{\boldsymbol{\eta}(t)}.
-
\boldsymbol{\Sigma}(t) is the solution to \frac{\mathrm{d}\boldsymbol{\Sigma}(t)}{\mathrm{d}t} = \mathbf{F}(t)\boldsymbol{\Sigma}(t) + \boldsymbol{\Sigma}(t)\mathbf{F}(t)^\prime + \mathrm{diag}\left (\mathrm{e}^{-\boldsymbol{\eta}(t)}\right )\boldsymbol{\Phi}\left (\boldsymbol{\eta}(t)\right )\mathrm{diag}\left (\mathrm{e}^{-\boldsymbol{\eta}(t)}\right ).
- The above ODEs are subject to the initial conditions: \mathbf{X}(t_{\ell-1}) = \mathbf{x}_0 - A^\prime\mathbf{1} + A^\prime\exp(Z(t_{\ell-1})),\ \boldsymbol{\eta}(t_{\ell-1}) = \mathbf{0},\ \mathbf{m}(t_{\ell-1}) = \mathbf{0}\ \left (\implies \mathbf{m}(t_\ell)=\mathbf{0}\right ), and \boldsymbol{\Sigma}(t_{\ell-1}) = \mathbf{0}.
In practice, we will use a non-centered parameterization for the LNA representation of the latent epidemic process in which we map a vector of independent Gaussian draws, \mathbf{N_{raw}}\overset{i.i.d.}{\sim} N(0,1), deterministically to a latent LNA path by noting that Z(t) \overset{\mathcal{L}}{=}\eta(t_\ell) + \Sigma(t_\ell)N_{raw}(t_\ell). Furthermore, we reset the initial conditions for the compartment volumes to reflect the cumulative numbers of infections and recoveries up to the beginning of each observation interval, and reset the counting process for the infections and recoveries. The LNA moments for the log transformed process for the next interval are then computed, and we obtain positive increments in the numbers of infections and recoveries by exponentiating the log-LNA values that were obtained via the deterministic mapping.
Note that the latent LNA process is Markov. Thus, the observed data likelihood, conditional on a latent LNA path, decomposes into a product of emission probabilities, which in the SIR example in this post is a product of negative binomial probabilities.
LNA_stan_explanation.pdf (195.1 KB)