This is a statistics question in addition to a Stan one:
I am modeling data collected at different ages. I started out modeling counts at age for simplicity’s sake, so the vector of real data counts is modeled as a multinomial of the simplex of estimated counts from the process model.
However, the true data aren’t actually integers, they’re real numbers (technically densities, not counts). I want to update the model to reflect this, and input the data as reals instead of integers.
But I’m not sure how to replace the multinomial. The obvious candidate is the Dirichlet, but every element of the data vector (alpha) needs to be nonnegative, which is not the case for my data (it contains a lot of zeros): 23.1 Dirichlet Distribution | Stan Functions Reference
Can you think of another analogous distribution that I can use (or a different version of the Dirichlet that allows zeros)?
I think the most direct way would be to have “zero-inflated Dirichlet” modelled after “zero-inflated Beta” (because Dirichlet is the multivariate extension of Beta). Implement the zero-inflated dirichlet distribution · Issue #722 · paul-buerkner/brms · GitHub has some references on how that construction could be made (you can’t just add a vector of zero-inflation probabilities as you know that not all values can be zeroes simultaneously, although that might be a good simple approximation).
With tha said, it is rare that one directly observes densitites - in practice, densities are often derived from actual count observations. It is possible those are not accessible to you, but one needs to be mindful of the fact that densities derived from low counts would be inherently more noisy than those derived from large counts (and thus one could e.g. include any predictors presumed related to the underlying counts as predictors for the precision of the Dirichlet).
Best of luck with your model!