I am modelling some count data that has a dispersion structure that does not match the distributions I know for this purpose, if someone has suggestions, even vague ones, they would be appreciated.
There are units i, each has 2–3 observations j. For a given i, given covariates X_i, a theoretical model tells me ratios a_i, b_i, c_i > 0, with a_i + b_i + c_i = 1.
Then I observe 2–3 draws (A_{i,j}, B_{i,j}, C_{i,j}), IID conditional on i. Their sum is 180, and their means are the ratios above E[A_{ij}]/180 = a_i, etc.
I tried the multinomial and the Dirichlet-multinomial distributions, neither is a good fit, which affects mixing of MCMC (Dirichlet-multinomial mixes much better, but is too dispersed). I looked at the data and also did posterior predictive checks, and the issue seems to be the following:
- A_{ij} has small, but slightly overdispersed variance, a bit (2x) larger than would be implied by the multinomial,
- B_{ij} and C_{ij} have larger variance.
It seems to be that agents vary their A less than their B & C. This actually makes sense in the context of this data. I am struggling to model it though.
I thought of a two-step model, with
where \kappa and \lambda control the overdispersion.
The variance of a plain vanilla binomial is n a_i (1- a_i), while, for the beta-binomial, the variance is
when parametrized above. So, if \kappa \to \infty, I get back the binomial.
I am unsure what prior to put on \kappa (and similarly \lambda) though. I thought of a vague lognormal prior which has the mean around the value that gives me the desired excess variance.
Again, comments welcome! I am happy to add plots if needed.