Multilevel count model with nonstandard dispersion structure

I am modelling some count data that has a dispersion structure that does not match the distributions I know for this purpose, if someone has suggestions, even vague ones, they would be appreciated.

There are units i, each has 2–3 observations j. For a given i, given covariates X_i, a theoretical model tells me ratios a_i, b_i, c_i > 0, with a_i + b_i + c_i = 1.

Then I observe 2–3 draws (A_{i,j}, B_{i,j}, C_{i,j}), IID conditional on i. Their sum is 180, and their means are the ratios above E[A_{ij}]/180 = a_i, etc.

I tried the multinomial and the Dirichlet-multinomial distributions, neither is a good fit, which affects mixing of MCMC (Dirichlet-multinomial mixes much better, but is too dispersed). I looked at the data and also did posterior predictive checks, and the issue seems to be the following:

  1. A_{ij} has small, but slightly overdispersed variance, a bit (2x) larger than would be implied by the multinomial,
  2. B_{ij} and C_{ij} have larger variance.

It seems to be that agents vary their A less than their B & C. This actually makes sense in the context of this data. I am struggling to model it though.

I thought of a two-step model, with

A_{ij} \sim \mathrm{BetaBinomial}(180, a_i \kappa, (1- a_i) \kappa) \\ B_{ij} \sim \mathrm{BetaBinomial}(180 - A_{ij}, b_i \lambda/(b_i+c_i), c_i \lambda / (b_i + c_i))

where \kappa and \lambda control the overdispersion.

The variance of a plain vanilla binomial is n a_i (1- a_i), while, for the beta-binomial, the variance is

n a (1-a) \frac{\kappa+n}{\kappa + 1}

when parametrized above. So, if \kappa \to \infty, I get back the binomial.

I am unsure what prior to put on \kappa (and similarly \lambda) though. I thought of a vague lognormal prior which has the mean around the value that gives me the desired excess variance.

Again, comments welcome! I am happy to add plots if needed.

1 Like

This is small enough that you can use a multivariate logistic normal. We don’t have that coded directly in Stan, but it’s easy enough to code with a transform. That’s a much richer model than Dirichlet in that it can code individual variances and correlations.

The basic form is Z \sim \textrm{multinormal}(\mu, \Sigma) with (a, b, c) = \exp(Z) / \textrm{sum}(\exp(Z)). Then (a, b, c) is a simplex, but you can control the covariance through \Sigma.

I wouldn’t try to use the beta-binomial, because it looks like your (A, B, C) triples are multinomial rather than being independently beta-binomial.

2 Likes

Thanks, but that looks like a continuous distribution to me, so I am not sure how to apply it to count data which is discrete. Also, it gives 0 likelihood for observations on the edges.

Yes, this is the multivariate equivalent of a beta prior—it needs to be coupled with a likelihood model if you have discrete data. So that’d be something like y ~ multinomial((a, b, c)) if the data’s multinomial. The Sigma just models the correlations in the prior more richly than a Dirichlet prior would.

Thanks for the explanation! So I would introduce a latent Z_{ij} for each observation within each unit? That would blow up my parameter space a bit, but if it improves mixing, it would be a net gain.

Another concern is that \mu and \Sigma would not be identified by the data (because the exponential normalization removes the scale). But I guess that could be mitigated by a prior on sum(mu).

You could, or you could have them shared. I think you might be able to marginalize out the Z—you an with a Dirichlet/multinomial, but I’m not sure about the logistic-normal/multinomial.