The Poisson distribution (at least the Poisson implemented in brms
) is defined so that the mean is equivalent to the variance. It cannot vary within some range.
Fortunately, you have a few off-the-shelf options in brms to accommodate discrete data like number of offspring. As noted by @Solomon , the negative binomial is a natural choice if your data are over-dispersed relative to the Poisson. A good way to evaluate under/over-dispersion is with ppc_rootogram()
from the tidybayes
package.
In my experience working with number of offspring, I often find that the number of young fledged is actually under-dispersed relative to the Poisson. If that’s the case with your data, I can offer a couple of alternatives. First, the Conway-Maxwell Poisson distribution is available in brms as an “experimental option” using family = brmsfamily("com_poisson")
. This distribution can accommodate both under- and over-dispersion. Please be aware that the current implementation uses a parameterization of the COM Poisson that offers much slower sampling, although that may not be an issue if your dataset is relatively small (hundreds of rows).
Another option would be to use a cumulative logistic regression, which tends work pretty well for under-dispersed data with a limited number of potential outcomes (for example, say between 1 and 4 fledglings). An example from the literature is available here.
One final thought would be a word of caution about the kind of model you propose to fit (or at least my understanding of it), using a grouping variable (1|ftreatment)
in the sigma
/shape
part of the model. How many treatment groups do you have? If it’s just a few, then the posterior distribution on the standard deviation/variance of the distribution of treatments that impact the variance of the response is likely to be poorly, and most probably very poorly, defined. You can mitigate this issue to some extent by using an informative prior, but my suspicion would be that the prior will be doing some very heavy lifting, if the model converges at all.
I have some other suggestions about your call to brm
in the original post. First, I would recommend running fewer post-warmup iterations, spread across more chains. The nature of the HMC sampling in Stan is that you need far fewer posterior samples than in alternative MCMC samplers (JAGS, NIMBLE, etc.). The defaults are generally pretty useful (warmup = 1000
, iter = 2000
, chains = 4
), at least as a starting point. Some of the diagnostics will improve as you add more chains.
I’d also encourage you to think more carefully about and define your priors explicitly, particularly for the group-level variables site
, yearsite
and fyear
. To do this, you might want to run some prior predictive checks to make sure that your prior push-forward distributions match your expectations about how many fledglings are possible. You can do this by setting sample_prior = "only"
.
I know there’s a lot to absorb above. I’m happy to clarify any points of confusion or elaborate if necessary.