Sampling distribution for SIR Model, non integer count data

Jeff_Graham · February 22, 2022, 3:57pm

I am trying to fit a SIRD model following along the Bayesian WorkFlow example. The big difference that I have is that my data is generated from a simulation and we are aggregating the data over multiple runs of the model, so instead of having integer count data like the British school data, we’ve got averaged data which are floating point numbers (stuff after the decimal). Here is the model portion of the Stan code:

model {
//priors
beta ~ normal(2,1);
gamma ~ normal(0.4,0.5);
delta ~ normal(.4,.5);
phi_inv ~ exponential(5);

//sampling distribution
//col(matrix x, int n) - The n-th column of matrix x. Here the number of infected people
cases ~ neg_binomial_2(col(to_matrix(y), 2), phi);

}

I am really new to all this, but if I understand the paper at all, it’s that last line using the negative binomial distribution that I will need to change. I was thinking that maybe a gamma distribution?? would be suitable here but am very open to any suggestions since I know so little about this topic. My question is, if a gamma distribution seems like the right thing, how do I modify the model to make it work. I’ve looked at the Stan manual entry for gamma and I know it has two parameters. Do I add priors for those parameters too? Any help is greatly appreciated.

Max_Mantei · February 22, 2022, 4:12pm

Hey!

I don’t have experience with SIR models, but in the case of finding a suitable replacement for the negative binomial with real value outcome, I think Gamma could work. Note though that it will not work if you have some cases that are zero…

You can use this parameterization:

\begin{align} \text{cases} &\sim \text{Gamma}(\alpha,\beta) \\ \alpha &= \phi^{-1} \\ \beta &= \frac{\phi^{-1}}{y} \end{align}

where \phi^{-1} is most likely what you already have as phi_inv. Also, maybe you could perhaps vectorize the \beta calculation with using a vectorized divide, i.e. rep_vector(phi_inv, N) ./ mu, but I’m not sure if that’s the fastest way to do it.

Btw, col(to_matrix(y), 2) looks kinda bad… What’s the type of y? In any case you could do the transformation to matrix in the transformed data block and access the second column just by indexing, i.e. vector[N] y_vec = to_matrix(y)[:,2];

Cheers,
Max

Jeff_Graham · February 23, 2022, 6:48pm

What would be an option if I had a zero entry in cases? I don’t think I do since we are averaging over a lot of runs. Thanks for the advice I will give it a whirl.

Max_Mantei · February 25, 2022, 3:31pm

Hey there!

If you have zero entries you have to model them separately, for example in a gamma-hurdle model (not an efficient implementation!):

...
for (n in 1:N){
    if (cases[n] == 0){
        1 ~ bernoulli(p);
    } else {
        0 ~ bernoulli(p);
        cases[n] ~ gamma(phi_inv, phi_inv / y_vec[n]);
    }
}
...

where p is the probability that cases == 0. You see that this is a bit awkward, because now you sort of have two models / outcome processes that you have to take care of…

Maybe another solution would be to implement your own neg_binomial lpdf that accepts floats / vectors of floats.

I think the by far the easiest solution would be to just round cases.

But maybe others have better ideas.

Cheers,
Max

Jeff_Graham · February 25, 2022, 8:24pm

Originally, I was rounding cases but I was curious to see if that caused any problems. I may go back to that if I need to.

Topic		Replies	Views
Beta-binomial model Modeling	2	2501	February 10, 2019
HYoergeometric distribution in stan Modeling	2	425	February 16, 2019
Count data modeling with discrete parameters Modeling specification	7	1474	July 13, 2017
RSTAN compilation problem Modeling	9	490	May 30, 2020
Treating a missing count data as a parameter and putting truncated binomial prior for it Modeling specification	7	403	October 26, 2023

Sampling distribution for SIR Model, non integer count data

Related topics