# Sampling distribution for SIR Model, non integer count data

I am trying to fit a SIRD model following along the Bayesian WorkFlow example. The big difference that I have is that my data is generated from a simulation and we are aggregating the data over multiple runs of the model, so instead of having integer count data like the British school data, we’ve got averaged data which are floating point numbers (stuff after the decimal). Here is the model portion of the Stan code:

model {
//priors
beta ~ normal(2,1);
gamma ~ normal(0.4,0.5);
delta ~ normal(.4,.5);
phi_inv ~ exponential(5);

//sampling distribution
//col(matrix x, int n) - The n-th column of matrix x. Here the number of infected people
cases ~ neg_binomial_2(col(to_matrix(y), 2), phi);

}

I am really new to all this, but if I understand the paper at all, it’s that last line using the negative binomial distribution that I will need to change. I was thinking that maybe a gamma distribution?? would be suitable here but am very open to any suggestions since I know so little about this topic. My question is, if a gamma distribution seems like the right thing, how do I modify the model to make it work. I’ve looked at the Stan manual entry for gamma and I know it has two parameters. Do I add priors for those parameters too? Any help is greatly appreciated.

Hey!

I don’t have experience with SIR models, but in the case of finding a suitable replacement for the negative binomial with real value outcome, I think Gamma could work. Note though that it will not work if you have some cases that are zero…

You can use this parameterization:

\begin{align} \text{cases} &\sim \text{Gamma}(\alpha,\beta) \\ \alpha &= \phi^{-1} \\ \beta &= \frac{\phi^{-1}}{y} \end{align}

where \phi^{-1} is most likely what you already have as phi_inv. Also, maybe you could perhaps vectorize the \beta calculation with using a vectorized divide, i.e. rep_vector(phi_inv, N) ./ mu, but I’m not sure if that’s the fastest way to do it.

Btw, col(to_matrix(y), 2) looks kinda bad… What’s the type of y? In any case you could do the transformation to matrix in the transformed data block and access the second column just by indexing, i.e. vector[N] y_vec = to_matrix(y)[:,2];

Cheers,
Max

1 Like

What would be an option if I had a zero entry in cases? I don’t think I do since we are averaging over a lot of runs. Thanks for the advice I will give it a whirl.

Hey there!

If you have zero entries you have to model them separately, for example in a gamma-hurdle model (not an efficient implementation!):

...
for (n in 1:N){
if (cases[n] == 0){
1 ~ bernoulli(p);
} else {
0 ~ bernoulli(p);
cases[n] ~ gamma(phi_inv, phi_inv / y_vec[n]);
}
}
...


where p is the probability that cases == 0. You see that this is a bit awkward, because now you sort of have two models / outcome processes that you have to take care of…

Maybe another solution would be to implement your own neg_binomial lpdf that accepts floats / vectors of floats.

I think the by far the easiest solution would be to just round cases.

But maybe others have better ideas.

Cheers,
Max

Originally, I was rounding cases but I was curious to see if that caused any problems. I may go back to that if I need to.