Hi, I am having some trouble figuring out how to marginalize a discrete parameter in my model. I have never done this before and I couldn’t figure out how to apply any of the explanations I found online.
The problem I am working on is estimating the proportion of COVID-19 infections that end up in the hospital (hospitalProp). In a simplified version of the problem, I have for several different locations the following data:
- hospitalizations: Number of hospitalizations (integer)
- population: The population at the location (integer)
- seroprevShape, seroprevRate: The shape and rate parameters of a gamma distribution over real values between 0 and 1, that described the distribution over the proportion of people that have been infected in the population (which I extract from seroprevalence studies)
Then, my model consists in the following:
Seroprevalence[N] \sim gamma(shape[N], rate[N])
Infections[N] = Population[N] \times Seroprevalence[N]
Hospitalizations[N] \sim Binomial(Infections[N], hospitalProp)
I would try to implement it the following way, but I cannot because of the discrete parameter (infections).
data {
int<lower=0> N; // number of observations
int population[N]; // population count
vector[N] seroprevShape; // Prevalence gamma shape
vector[N] seroprevRate; // Prevalence gamma rate
int hospitalizations[N]; // outcome count
}
parameters {
// Countries outcome fit
real<lower=0, upper=1> hospitalProp; // hospitalization proportion
vector[N] prevalence; // underlying prevalence at each location
}
transformed parameters {
int infections[N]; // number of infections at each location
infections = prevalence .* population;
}
model {
hospitalizations ~ binomial(infections, hospitalProp);
}
I have been looking into how to marginalize the number of infections out of the model, but I am at a loss. Any help or pointers on where to look will be greatly appreciated.
Solution:
The solution proposed by @maxbiostat solved the problem. Since I had to fix a couple of details of the code he shared, I’m leaving the final solution here in case someone uses it in the future:
The solution was to define a binomial probability function that takes as input a real valued N. This is achieved with the following code at the start of the script:
functions{
real bin_lpmf(int x, real n, real theta){
real ans = lchoose(n, x) + x*log(theta) + (n-x)*log1m(theta);
return(ans);
}
}
Then the sampling in the model is done with
model {
hospitalizations ~ binomial(infections, hospitalProp);
}
The difference with @maxbiostat code is that lpdf is substituted with lpmf because it’s probability distribution over discrete values, and that log1p was substituted for log1m which is the correct form of the density function.