Continuous approximation of latent negative-binomial stochastic process in epidemic model with beta-binomial observation likelihood

I am trying to reproduce a stochastic epidemic model from Nature Communications paper (code: GitHub - roidtman/NatComm_dengue_China ), and I would appreciate some advice on whether my current Stan implementation is statistically and computationally reasonable.

Model structure in the original paper

In the paper, the authors specify the model in two stages:

  1. Latent stochastic transmission process
    At each time ttt, a latent number of new infections λt\lambda_tλt​ is generated as

8227ee93-caa6-4aba-b644-07b338e133c9

where $\beta_t$​ is a time-varying transmission rate, and \phi is an overdispersion parameter.

  1. Observation model
    The reported cases are then modeled using a beta–binomial likelihood:

ff7391e8-b922-4d1c-85c7-9703e9a0fdbf

where the beta parameters are functions of the latent infection count \lambda_t​.

Thus, the negative binomial component represents process noise, while the beta–binomial distribution is the actual observation likelihood.

Problem in Stan

As discussed in multiple threads (e.g. ( Discrete parameter in Stan - #8 by ghjk )( Initialization Error in Stochastic Epidemic SEIR Model - #17 by maxbiostat )):

  • Stan does not support discrete latent parameters.
  • RNG functions cannot be used in the model block.

Therefore, it is not possible to directly represent the latent negative-binomial stochastic process in Stan.

My current approach: continuous approximation

Following suggestions from the Stan forum, I approximate the latent stochastic process using a Gamma–Poisson mixture representation:

λ_t \sim Gamma(\phi ,\phi \mu_t),\\ \text{cases}_t ∼ BetaBinomial(N, 1+\lambda_t, 1+N−\lambda_t),\\

In Stan, this is implemented as:


parameters {
  vector<lower=0>[N] para; 
}

model {
   for (t in (lag_max + 1):N) {

    para[t] ~ gamma(infectious_pressure[t], infectious_pressure[t] / mu[t]);
    
    real alpha = 1 + para[t];
    real beta_param = 1 + pop - para[t];
    
    target += beta_binomial_lpmf(cases[t] | pop, alpha, beta_param);
  }
} 

My question

Conceptually, does this formulation make sense as a continuous approximation to a discrete stochastic epidemic process?

In particular:

  1. Is it reasonable to treat the latent negative-binomial process as a Gamma-distributed continuous variable when the observation model is beta–binomial?
  2. Would marginalizing out the latent process (if possible) be preferable to this hierarchical approximation?

Any feedback on whether this modeling strategy is principled—or suggestions for better alternatives within Stan’s constraints—would be greatly appreciated.

Thank you very much for your time.

2 Likes

I’ve edited the LaTeX on your post to try and make sense, but please revise. In the meantime I’ll try and think about a response to your query.

2 Likes

Thank you very much for taking the time to improve the clarity of the post — I really appreciate your help.

In your expression for \mu_t, you have a term I and a term w. What are these terms, and does either of them depend on a latent discrete quantity (i.e. \lambda)?

The beta-binomial with negative-binomial trials–I think–can be marginalized exactly for use in Stan. But if the time evolution of \mu depends on discrete latents, then it might be really hard to marginalize the time evolution of \mu.

Note that if the time evolution of \mu depends on cases, that’s not a problem. cases is discrete, but it’s not latent.

2 Likes

Thank you very much for the careful reading and for raising these important points — this is very helpful.

In my formulation,

\mu_t = \beta_t \sum_{l} I_{t-l} w_l

the two terms are defined as follows:​

\I_{t-l} denotes the observed number of cases at time , including local and imported cases. These are data inputs and are not latent. \w_l is the serial interval distribution, representing the relative contribution of infections occurring l days earlier to transmission at time t.

Neither \I_{t-l} nor \w_l depends on the latent stochastic variable \lambda_t. The latent negative-binomial component in the original paper is introduced only to represent process-level stochasticity in transmission intensity.

Thanks again for pointing this out — it helped clarify where marginalization is feasible and where it would indeed become problematic.

In this case, treat the beta binomial with negative-binomial trials as follows. In your original post you use \lambda to refer to both a latent that is negative binomially distributed and a latent that is Gamma distributed. To avoid confusion, I’ll use different variable names.

First, take
\omega \sim \textrm{gamma}(\mu, \phi)

Representing the negative binomial as a gamma-poisson mixture, we now have, conditional on \omega, that cases is distributed as a beta-binomial with a Poisson-distributed number of trials. The binomial probability p arises from a beta distribution, and \omega is the Poisson rate.

p \sim \textrm{beta}(\alpha, \beta)

Thus, conditional on p and \omega, you have

\textrm{cases} \sim \textrm{Poisson}(\omega p)

The latents \omega and p are both continuous.

Check my work, but I’m pretty sure this is the exact model.

1 Like

Thank you very much for this detailed and insightful suggestion, I will give it a try. Many thanks.

It’s possible that you can integrate out \omega analytically, and arrive at a representation where cases is negative-binomially distributed conditional on \mu, \phi, and p, but I don’t have time to make sure of the math. If this works mathematically, it is likely to be a much more efficient parameterization.

Edit: of course you can integrate out \omega analytically! D’oh. Conditional on p you still have a gamma-poisson mixture. So collapse that to a negative binomial, conditional on p.

Thank you very much for the follow-up and for pointing out the collapsed representation.

I actually tried implementing both versions you suggested:
(i) the explicit formulation with continuous latents ω and p, and
(ii) the collapsed formulation where ω is integrated out and

In my application, however, I found that neither of these two formulations converged reliably (divergences and poor mixing), even after tightening priors and simplifying the structure.

Interestingly, my original smplified approach does converge reasonably well for the same data and model structure, that is

cases_t \sim \text{NegBinomial} (\mu_t, \phi)

where \phi is the dispersion parameter and \mu_t is caculated within the model.
Do you have any ideas?

I’d imagine there could be an identifability issue between the negative binomial overdispersion parameter and the betabinomial overdispersion—but that’s as close as my guess can get.

2 Likes

Thank you for the helpful perspective. The problem have been solved, my code is here

functions {
  #include "../spline/functions/spline_functions.stan" 
  real marg_lpmf(int y, int N, int pop, real mu, real phi) {
    vector[N + 1] lp;
    real logZ = normal_lcdf(N | mu, sqrt(mu + mu*mu/phi));

    for (lambda in 0:N) {
      real log_obs = beta_binomial_lpmf(y | pop, 1 + lambda, 1 + pop - lambda);
      real log_proc = neg_binomial_2_lpmf(lambda | mu, phi) - logZ;
      lp[lambda + 1] = log_obs + log_proc;
    }
    return log_sum_exp(lp);
  }

}