Dealing with unobserved groups with count data

Ax3man · April 20, 2021, 5:18pm

Hi all. I’m helping some colleagues analyze some count data. However, I can’t get a good model fit and am looking for some advice.

This is a genomic study, but you don’t really need to understand biology to understand the problem. We have sampled a bunch of species, and tallied the number of genes in each gene family. We now want to know the number of genes in a family can be explained by an ecological variable. We are interested in the per-family slopes.

Here’s the variables that actually matter:

genes: how many genes we found, count data.
family: to what gene family these genes belong, categorical.
species: what species this was measured in
ecology: a continuous predictor variable

The model is straightforward:

genes ~ 1 + ecology + (1 + ecology | family)

(The full model also has parameters for effects of species and phylogeny, but that’s not important here.)

So I expected these counts to be over-dispersed, and fitted a negative binomial model. However, the posterior predictive check looks problematic to me:

Rplot

(I’ve zoomed in on the lower range here.)

I think the problem here is that our data has a lack of 0 values to confirm to the assumptions of the negative binomial model. This is because a gene family can only be observed if it had at least 1 gene in one of the species. Since the number of genes per family doesn’t vary that much between species, we end up with too few zeros.

Does the posterior predictive check also look problematic to you?
Is there way I can account for unobserved zero values? Is there a different family I can use?

Appreciate any help!

Ax3man · April 26, 2021, 5:29pm

I figured out a hurdle Poisson model fixes this issue.

Unfortunately, the negative binomial model already took a long time to fit, and fitting the hurdle Poisson model on the full dataset is not feasible for us.

Topic		Replies	Views
Poisson model overestimating dispersion and inflating zeroes brms fitting-issues , poisson , phylogenetic	24	3841	August 26, 2020
Zero-inflated poisson model for count data Interfaces brms	8	1145	August 6, 2023
Unbalanced observations in a group, hierarchical poisson model brms	7	622	September 28, 2020
Prior predictive check for mixed zero-inflated and Poisson models Interfaces brms	2	774	September 14, 2023
New to brms and categorical modeling - dealing with messy biological data - insights and advice welcome Modeling techniques , brms	6	96	June 23, 2025

Dealing with unobserved groups with count data

Related topics