Poisson model overestimating dispersion and inflating zeroes

I am an evolutionary biologist trying to fit a model (brm.count) in which:

  • the response variable is the count of a phenotypic structure (e.g., toes) that a species possesses (trait.count);
  • the population-level effects are two continuous variables and their interaction (continuous1 and continuous2);
  • intercepts can vary over the group-level effect species, and species are correlated according to the phylogeny covariance matrix.

Since the response variable is count data, I opted to choose a Poisson distribution. There is slight overdispersion in the data (index of dispersion, \sigma^2/\mu = 1.1) and 78% of the observations are zeroes. This is how the distribution of counts (trait.count) looks like:

The code that I used to run the model is as follows:

brm_count <-  brm(trait.count ~ continuous1*continuous2 + (1|gr(species, cov = A)),
                         data = data,
                         family = poisson(),
                         data2 = list(A = phylogeny),
                         chains = 2,
                         cores = 2,
                         iter = 4000)

When I run a posterior predictive check on the output of this model, here’s what I get:

pp_check(brm_count, type="bars", nsamples = 100)

Besides underestimating the number of ones, the model seems to be overestimating the number of zeroes. This is further evidenced by checking the proportion of zeroes in samples from the posterior predictive distribution, compared to the observed proportion.

prop_zero <- function(x) {sum(x == 0)/length(x)}
ppc_stat(y = data$trait.count, yrep = posterior_predict(brm_count, draws = 1000), stat="prop_zero")

But what I found the weirdest is that the model also overestimates dispersion!

dispersion <- function(x) {var(x)/mean(x)}
ppc_stat(y = data$trait.count, yrep = posterior_predict(brm_count, draws = 1000), stat="dispersion")

My question is: how come the dispersion of my data (1.1) is inflated to ~2 by a model that assumes no overdispersion (index = 1)? Shouldn’t I expect the opposite: for dispersion to be underestimated? Am I missing something?

(By the way, I also fitted this model with zero-inflated Poisson, negative binomial and Poisson with observation-level random effects (1|obs), and the issue persists.)

  • Operating System: macOS 10.12.6
  • brms Version: 2.13.0

Thank you so much for any help!

UPDATE: I ran ppc_stat() with variance and mean to check the source of the overdispersion. Turns out the mean is pretty accurately estimated, while the variance is overestimated (0.3 in the observed data, ~0.6 in the posterior predictive samples).


Hey @jocateme

That’s interesting. I can’t answer your question but here are a couple thoughts.

The model has a few parts as you explain–your covariates, the varying intercepts, and the correlation structure for the latter. Have you tried removing parts one at a time and re-evaluating results? I might start with just an intercept y ~ 1, plot with pp_check(), and add parts back in one at a time to see how the predictive distribution evolves.

Another possibility is that the priors are not what you’re expecting. I know brms has good (generally ‘safe’) default priors, but that might be worth looking into as well.


Hi @cmcd,

Thanks for the excellent tips!

I followed your suggestion, fitting the simplest possible model first, and then gradually making it more complex.

In the figure below, you can see the dispersion of predictive posterior samples from increasingly complex models. The simplest model (~ 1) predicts dispersion exactly as I would expect in a Poisson distribution, with the mean roughly equal to variance (\sigma^2/\mu ≈ 1). Adding a predictor brings it up to 1.1, and adding two predictors brings it up to 1.2 (this is roughly the dispersion in the observed data). The interaction term doesn’t seem to affect dispersion, be it with or without the additive terms. The group-level effect + correlation structure, on the other hand, significantly affects it; the dispersion predicted by this model is similar to the one predicted by the full model in the original question (\sigma^2/\mu ≈ 2). Since each observation refers to a species, I just realized that the group-level effect (1|species) might be acting like an observation-level effect, which is frequently used to model residual variance in count data. Might this explain the inflated overdispersion?

I believe the overdispersion in the models is likely arising from overestimated high counts that are not present in the observed data (> 3); the maximum predicted counts (below) are an indication of that. Zeroes appear to be overestimated as a consequence: they keep the mean estimation accurate despite these high counts (it could also be the other way around, of course).

The model in the original question and the simulations above all take flat (default) priors for effects. I tried using more informative priors (normal(10, 0)) for the full model, but things didn’t change.

Hopefully this throws some light on what might be going on!

1 Like

I’d be curious to see what happens to the pp_check() with y ~ (1|group), dropping the covariance matrix. I’ve been wondering if its the covariance matrix, because correlated data has higher variance than uncorrelated data. Maybe its a problem with your phylogeny data or maybe the correlations just aren’t as strong as the phylogenetic structure would suggest.

It makes some sense to me that the maximum value from the predictive distribution is always higher than the observed maximum—that may be a good thing because it would be highly unlikely to observe the highest value out there in the world if you’re predicting outside the observed set of species.

But I’ve never worked with phylogenetic data! @maxbiostat might have some ideas.


Thanks for the answer, @cmcd!

Here’s pp_check() with trait.count ~ (1|species) (without the covariance matrix), as compared to trait.count ~ (1|gr(species, cov = A)):

Looks like the covariance matrix indeed adds a lot of variance, as you suspected. What you’re suggesting is that for some reason the phylogeny might not be fully capturing the correlation among data, thus leaving part of the variance unaccounted for?

(By the way, I don’t even know if this is an issue at all! I just thought I’d ask for help because the overdispersion thing was bugging me, but maybe this is expected in Poisson GLMMs with covariance matrix?)

What I’m wondering is if your data has less phylogenetic correlation than you’re expecting, and that’s why the predictive distribution doesn’t cover the data once you add in the covariance matrix—though I’m just going on some rough intuition. For now it looks like the uncorrelated structure is much more plausible than the correlated model.

Have you used the phylosignal R package or something similar? They provide a method for measuring the strength of phylogenetic correlation (using Moran’s I, from spatial statistics). If its really low, that might explain what’s happening. I’d also still entertain the notion that there’s something wrong with the covariance matrix (some bad data or a data cleaning process gone wrong), if I were finding something similar and thought it was really unexpected.

Also (edit): I’m not very familiar with the phylogenetic models that are implemented here in brms, I’m trying to pick some of this up but it looks like the amount of variance inflation induced by the correlation structure has to follow the scale parameter for the varying intercepts. That really could explain the strange behavior. The spatial statistics literature addresses this by splitting this into two separate terms, one that is (spatially) structured and the other unstructured, each allowed to have their own scale. That’s why at first it didn’t make sense to me but looked familiar, cause I work with spatial models.

So I suppose what this means is if you have only modest phylogenetic correlation (or none) but still have substantial variation across species (hence some substantial variation across varying intercepts 1|species ) this model with the covariance matrix could cause problems.

1 Like

Hey @cmcd! Thank you for not abandoning me!

So, my data has considerable phylogenetic signal (i.e. data are phylogenetically clumped), as measured by Moran’s I or any other metric. I was prepared to agree that there must be something wrong with the covariance structure, as you suggested, even though I can’t really imagine where it might have gone wrong. The phylogeny itself comes from a widely used source (birdtree.org) and was transformed into a variance-covariance matrix with ape::vcv(), which is standard practice as far as I’m aware.

But then I decided to test whether phylogenetic signal has some role in inflating variance, as you imagined, by simulating some data. I simulated data under a Poisson distribution in the presence or absence of phylogenetic signal. The Poisson’s lambda is based on a subset of the data in the first post. For each dataset, I fitted a brms model predicted by (1|species) and (1|gr(species, cov = A)). Here’s what I found:

Turns out that phylogenetic signal does influence variance, but in the opposite direction! The covariance matrix is only inflating variance in the presence of phylogenetic signal (i.e., phylogenetic correlation).

I’m clueless! Could it be that phylogenetic correlation is so high that the model can’t handle it even with a covariance matrix, thereby affecting the model’s accuracy? But then how come the model’s prediction is spot-on without the covariance matrix, right?

1 Like

That’s a great approach.

Its not clear to me how you simulated the two data sets but one thing I notice is that the histograms for the data with and without signal look the same. If you’ve managed to simulate an outcome with strong autocorrelation I would expect it to have more zeros and some more large numbers, relative to the data without that signal.

Out of curiosity, what’s the value of the Moran’s I for your data? Then, if you take the posterior mean of the varying intercepts for your model, what’s the Moran’s I for that vector (with and then without the covariance matrix)?

I feel a bit stumped, but that birdtree website is pretty cool

1 Like

I simulated the no-signal data set (I = -0.01) by simply drawing values from a Poisson distribution and randomly attributing them to species.

The with-signal data set I didn’t really simulate, I just manually manipulated a few original values so that the proportions would be equal to the no-signal data set, but with a higher I (0.14). (This is a manipulated subset of data; the original, complete data's I was 0.31.)

Doing as you instructed, “posterior” Moran’s I is 0.26 for the with-signal data with the covariance matrix, and 0.14 without the matrix.

A similar pattern is found for the no-signal data: posterior I is 0.31 with the matrix and -0.01 without it. But it doesn’t seem to affect the model accuracy as assessed with pp_check() in this case.

What do you think of this interpretation:

It looks like you’re able to identify that when you add that covariance matrix into the mix, the autocorrelation in the varying intercepts is increasing beyond what’s present the data, and by quite a bit. That matches what you found to begin with on the first post, and is close to explaining it, since the autocorrelation would create the variance inflation, pushing the predictions away from the observations. It doesn’t seem to me that you’re doing something wrong; I think you’re finding that the model really isn’t right for your data.

A similar pattern is found for the no-signal data: posterior I is 0.31 with the matrix and -0.01 without it. But it doesn’t seem to affect the model accuracy as assessed with pp_check() in this case.

The second observation you make in that quote above still seems strange, maybe more so now. But maybe not, since the variance inflation is going to be a product of both the degree of correlatedness (in this case its kind of fixed by the matrix you pass in, right?) and the scale of the varying intercepts. Drawing on spatial models, you can have an autocorrelation component that has very high autocorrelation yet have small scale, in which case it will have minimal impact on the predictions. That dynamic might be relevant here (because its being restricted), but it looks like both the sample datasets you’re working with there have same or similar scale.

I’m surprised this hasn’t come up before, though maybe it has somewhere else.

At least relative to what I’m used to seeing in spatial data, those are fairly low values of Moran’s I. 0.31 is maybe ‘moderate’ but 0.14 is quite low.

I’ve been reading and re-reading your answer since yesterday, and I think it makes total sense. But I kept coming back to the fact — which you also found strange — that pp_check() indicated that both models were just as accurate, even though one of them got the autocorrelation so wrong. This made me wonder: might pp_check(), by itself, simply not be a good enough diagnostic in my case?

Here’s what I thought: imagine if I draw values from a Poisson distribution (with lambda based on the observed data’s mean) and randomly attributed to the species in my data set, just like I did to generate the simulated no-signal data set. Now imagine this vector is a predictive posterior sample from a model fitted to my data. This is the histogram of the “predicted” data, compared to the observed data (the equivalent of a pp_check()):

Not too bad, right? But if we look at how accurately each value was “predicted”, we see that only 56% of them were accurately
“predicted”, with the remaining 44% differing from the observed data by 1 or 2. This goes to show that an overall good-looking predictive distribution may mask within-observation inaccuracies.

So, I decided to look at potential within-observation inaccuracies in the posterior predictive samples of y ~ (1|species) and y ~ (1|gr(species, cov = A)). Here’s the result:

Turns out that, even though the overall distribution looks worse when the covariance matrix is added (upper panels in post #7’s figure), the model is more accurate at predicting the count of specific observations (species). In the case of the predictive samples used for the figure above (one per model), 71% of the values are accurately predicted with the covariance matrix, versus 57% without it.

Of course, all of the rest stands — it’s still strange that variance is inflated, to which your explanation sounds like a likely one. But maybe the model with covariance matrix is not as bad as it seemed at a first glance.

1 Like

Can you share the code behind those plots? I want to make sure I understand what those represent

Sure! I should have provided them together with the plots; sorry for that.

Fitting the models with the “simulated” with-signal data (subset$sim; see quote below):

# without matrix
brms_sim <- brm(sim ~ (1|species),
                      data = subset,
                      family = poisson("log"),
                      control = list(adapt_delta=0.95),
                      chains = 2,
                      cores = 2,
                      iter = 4000)

# with matrix
brms_sim_gr <- brm(sim ~ (1|gr(species, cov = A)),
                           data = subset,
                           family = poisson("log"),
                           data2 = list(A = phylogeny),
                           control = list(adapt_delta=0.95),
                           chains = 2,
                           cores = 2,
                           iter = 4000)


# without matrix
ppredict <- posterior_predict(brms_sim, nsamples = 1)
hist(ppredict - subset$sim, breaks = -5.5:5.5, ylim = c(0,120), main = "Signal, y ~ (1|species)", xlab = "predicted - observed")

# with matrix
ppredict_gr <- posterior_predict(brms_sim_gr, nsamples = 1)
hist(ppredict_gr - subset$sim, breaks = -5.5:5.5, ylim = c(0,120), main = "Signal, y ~ (1|gr(species, cov = A))", xlab = "predicted - observed")

Hope this helps, and thanks again for all the effort you’ve been putting into this!

1 Like

Thanks for sharing, I think you have sound logic here but lets try to specify what that is and follow through with it.

I see a couple strands of thinking coming together. One is about predictive accuracy/error. Here you’ve pulled out one sample from the posterior predictive distribution from each model and done a comparison. But that’s just a sample of one, so you’ll want to repeat that some number of times. So why not put that into ppc_stat and see what you get?

The other element here is the look of the observations relative to the predictive distribution, in your pp_check() plots. When we see the data falling somewhere outside the range of the model predictions, we get the sense that the model is systematically wrong in some sense, so that the data are improbable realizations of the model. The probability that these data arise from that model appears to be low, is the idea.

It seems like there should be some correlation between these two concepts. Here’s how the loo package describes the Widely Applicable Information Criterion:

Leave-one-out cross-validation (LOO-CV, or LOO for short) and the widely applicable information criterion (WAIC) are methods for estimating pointwise out-of-sample prediction accuracy from a fitted Bayesian model using the log-likelihood evaluated at the posterior simulations of the parameter values.

The reference to the log-likelihood is the probability of the data conditional on the fully specified model and all its estimated parameters: say you have a Poisson model with \mu_i = exp( \alpha + x_i \beta ), then the likelihood of an observation is Poisson( y_i | \mu_i); but we have to repeat for all M posterior samples in order to see results across the joint posterior distribution, so every observation gets all these likelihoods Poisson(y_i | exp[\alpha_j + x_i \beta_j]). Then repeat for all observations too…so I think the pp_check() is like a shortcut that visualizes something like this concept for us. If you have predictions sitting far away from observations, like in a density overlay, you’re seeing points with low likelihood.

So if you haven’t done so already, I’d also try WAIC on the models you’re considering for your data and see what happens.

Hey @cmcd! Once again, thanks for the help.

I will answer/comment on the topics you brought up below, but I want you to feel comfortable to just not respond if you feel like it. You’ve already been so helpful and I don’t want to impose on your goodwill.

I ran ppc_stat() with the percentage of correctly predicted observations (as per the function pct_correct() below) for each of the models. I wasn’t sure what metric would best summarize this “within-observation” accuracy; let me know if you think of a better one. The model that includes the covariance matrix does seem to be consistently more accurate if we look at the % of correctly predicted observations, though:

pct_correct <- function(x){sum(x == subset$sim)/length(subset$sim)}

I had come across LOO and WAIC reading brms vignettes, but hadn’t quite understood the underlying logic, so thank you! I hadn’t tried using these methods with my data because I felt that my issue precedes it. That is, both candidate families for my data (Poisson and negative binomial) produced strange-looking predictive posterior distributions, so I felt that there was no point comparing their (apparently bad) fit to data before resolving this issue.

But I hadn’t thought of looking at what these methods would tell me about the covariance matrix dilemma. I ran loo::waic() with both models (brms_sim and brms_sim_gr from the previous post = the ones used in ppc_stat() above), and then loo::loo() with reloo = TRUE as suggested by WAIC’s output.

loo(brms_sim_gr, brms_sim, reloo = TRUE)

Both methods produce very similar outputs that indicate that the model with the covariance matrix is a better fit (as seen in Model comparisons at the bottom of the output):

Output of model 'brms_sim_gr':

Computed from 4000 by 136 log-likelihood matrix

         Estimate   SE
elpd_loo    -80.9  7.6
p_loo         9.1  1.5
looic       161.9 15.3
Monte Carlo SE of elpd_loo is 0.1.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     121   89.0%   154       
 (0.5, 0.7]   (ok)        15   11.0%   646       
   (0.7, 1]   (bad)        0    0.0%   <NA>      
   (1, Inf)   (very bad)   0    0.0%   <NA>      

All Pareto k estimates are ok (k < 0.7).
See help('pareto-k-diagnostic') for details.

Output of model 'brms_sim':

Computed from 4000 by 136 log-likelihood matrix

         Estimate   SE
elpd_loo   -110.7  8.1
p_loo         5.7  0.8
looic       221.4 16.2
Monte Carlo SE of elpd_loo is 0.1.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     127   93.4%   1519      
 (0.5, 0.7]   (ok)         9    6.6%   1952      
   (0.7, 1]   (bad)        0    0.0%   <NA>      
   (1, Inf)   (very bad)   0    0.0%   <NA>      

All Pareto k estimates are ok (k < 0.7).
See help('pareto-k-diagnostic') for details.

Model comparisons:
                  elpd_diff se_diff
brms_sim_gr   0.0       0.0  
brms_sim    -29.8       2.9
1 Like

ah it dawned on me as soon as i saw your reply that I probably gave an unhelpful suggestion, with using pp_stat in that case.

But with waic, I meant that you could use waic or loo not for the simulated data but for model comparison with your real data. As in, you might fit the 1|group model and then refit after adding in the cov matrix, then apply waic or loo to each to compare the two models for your data.

1 Like

First - this is a great thread. Good questions asked, sensible answers given, lots of work invested in getting things right from everybody. Pleasure to read! I’ll try to add some of my thoughts:

In my undrestanding LOO (or the XXIC crowd) metrics provide very different information than PP checks - although there definitely is some correlation at least in that a bad model is likely to fail PP checks AND have bad LOOIC. PP checks are a qualitative measure and can tell you if your model is systematically wrong/overconfident in some parts of the data. The model can have small error between prediction and data, but underestimate the error and thus fail a PP check, or you could have a model that has large error, but the error is not systematic and the magnitude of the error is estimated accurately, which will pas PP check just nice.

LOO on the other hand tries to quantitatively estimate predictive performance on yet unseen data. I understand LOO less well than PP checks, but while it will penalize models for being overly certain (underestimating error), it will also penalize models for being overly uncertain or for being heavily influenced by few individual observations (that’s what the “leaving one out” part is for).

In the case here, I think PP checks are a better tool to understand what is happening, because they let you bee specific. I would also potentially look into doing the _grouped version of some of the PP checks, grouping by species or some intermediate taxonomical unit (with some random smaller subset of species to keep the plot legible). Does the pattern repeat at lower levels? Or are there taxonomic groups that are systematically under/over estimated?

I also think the problem might really be in the response distribution: Poisson and probably also neg. binomial (not sure here) might be a terrible model of count of toes or similar. The problem with Poisson is that if you have a lot of zeroes and some ones, your really have to give quite high probability to quite high numbers. I think neg. bionomial is less restricted in this sense but once again, it might not be able to handle the sharp drop from the ones and twos to basically non-occuring values > 2 (and no occurences for > 3).

In other words, you AFAIK basically can’t have Poisson or neg. binomial give high probability to 1 or 2 while also not giving noticeable probability to 3 or 4 (because their variance is at least the mean). I think this is what your model struggles with: to fit the absence of 3s and above it needs to decrease the mean which leads to overestimating the zeroes.

I am less sure what a good distribution for this case might be… I guess zero-inflation is likely, because the causes of not having a trait at all seems likely to differ from the causes driving changes in the number of the trait. If you can define some sort of “opportunity” for a trait than a (zero inflated?) binomial distribution might not be terrible - i.e. if I am interested in the number of toes used for walking than modelling it as binomial with the total number of toes as the number of trials and number of toes used for walking as successes might be sensible (sorry if this is bad example, my understanding of biology is quite limited). Alternatively maybe there is a different source for defensible strict upper bound on the number of the traits making the binomial more attractive.

I also find it likely there is some literature, potentially quite old, for good distributions of these kind of quantities. There is huge literature on “best” distributions for various data. Fun fact: in older papers (I guess 80s and before) they often use the term “law” instead of “distribution” - so this might help your googling :-). You might want to look for some “underdispersed” distributions (my guess is that low overdispersion in the total data means that after accounting for covariates the response is almost certainly underdispersed). Quick Googling turned out stuff like this:

Many of the suggestions would however require you to implement the distribution yourself and my experience with trying to implement some _over_dispersed Poisson variants from the literature in Stan is that this can be a challenge of its own (Bayesian inference has somewhat bigger requirements on the distributions behaving “neatly” than many methods for frequentist estimation). Depending on how big penalty you place on being wrong, it might be the case that the effort required is not worth the (uncertain) improvement in model fit.

EDIT: There is experimental support for the COM Poisson distrubution in brms: https://github.com/paul-buerkner/brms/issues/607

You might even want to try and ordinal response (see the cumulative family) - but once again, this forces you to set a strict upper bound. Or you could even fit a continuous positive distribution (gamma/lognormal) to (count + 0.5) - those should allow for this “underdispersion” so you could be getting a less theoretically appealing model but with - in some sense - better fit (heretical, I know :-)). Once again your circumstances should determine if that is a trade-off worth taking.

Finally, you can always fit multiple models and if the important inferences stay the same, you don’t have to worry much which of them is “most” appropriate (see also: multiverse analysis).

Best of luck with your model!


Thanks @martinmodrak for contributing all of that!

I like this a lot:

LOO on the other hand tries to quantitatively estimate predictive performance on yet unseen data. I understand LOO less well than PP checks, but while it will penalize models for being overly certain (underestimating error), it will also penalize models for being overly uncertain or for being heavily influenced by few individual observations (that’s what the “leaving one out” part is for).

key being the distinction between performance on unseen data and in sample error, which is where I gave probably poor guidance

1 Like

Thank you both for such helpful remarks!

I was automatically going with the simulated data because there are fewer observations meaning that models are much faster to fit, but you’re absolutely right that I should be looking at the real data. From here on in this post, I will always be referring to the real data (i.e., trait.count in the first post).

So, here’s the model comparison of loo() for models with and without the matrix. Like with the simulated data, LOO indicates that the model with the covariance matrix (brm_count_cov) is a better fit for the real data:

Model comparisons:
                elpd_diff se_diff
brm_count_cov   0.0       0.0 
brm_count_gr   -231.8     11.1 

I’m also enjoying and learning a lot from @cmcd’s and now your excellent remarks. Thank you for chiming in, @martinmodrak!

I can’t group by species because there is only one observation per species, but here’s how PP check looks for each of the main four orders (order > family > species) that collectively comprehend 650 of the 736 species in the data set:

(This is the model that includes the covariance matrix)

Apparently data are not too Poisson-y for Galliformes and consequently PP check looks terrible for this particular order. Is this a problem? I mean: is there a problem if subsets of a data set do not follow a particular distribution, even if the data set as a whole does?

This is interesting because I think what happens when I do not include the covariance matrix is precisely that there is large error between prediction and data, but the error is not systematic. This would explain why this model passes PP check but not LOO.

This is easy to visualize when I plot the data onto the phylogenetic tree. In the figure below, each tip of a branch is an extant species (present days). The ancestral of all these species is the node at the center of the tree (millions of years ago). Extant species are more related (and assumed to be correlated) the longer they share a branch — this information is the basis of the covariance matrix. The 1st ring around the tree is the observed counts, the 2nd and 3rd rings are the mean predicted counts across 100 samples without and with the covariance matrix, respectively.

first.ring <- data$trait.count
second.ring <- round(colMeans(posterior_predict(brm_count_gr, nsamples = 100)))
third.ring <- round(colMeans(posterior_predict(brm_count_cov, nsamples = 100)))

Rings are colored in a red-yellow-green scheme, where red is count = 0 and dark green is count = 3.

You can see that, on average, the with-matrix model (3rd ring) does a fairly good job in predicting a particular species’ count, whereas the no-matrix model does a terrible one. In the latter case, the mean count is almost always 0 — which I interpret as evidence that data are predicted in the right proportions (as evidenced by PP check) but regardless of the phylogenetic structure. Since the proportion of zeroes is ~80% (figure in post #5), each species would be predicted a count of 0 in ~80% of the simulations (I hope this makes any sense).

I absolutely agree with this interpretation, and I think it is somehow related to what I was trying to say here:

But then again, if it is an inherent incompatibility between the distribution and my data, how come the model handles it just fine when I don’t include the covariance matrix (e.g., figure in post #5)?

That’s on me for giving such a terrible biological example (no footed species has 0 toes!). The actual trait I am investigating are spurs, bony projections present in the feet or wings of some birds (two examples here). It is absolutely plausible that different causes are driving absence/presence and number of spurs. The conditions you suggested for using a binomial distribution do not seem to apply, though.

That is a very interesting thought that hadn’t crossed my mind. I’m running a model with the COM Poisson distribution that you mentioned, but it’s going to take a while. Fingers crossed!

Thanks for this excellent tip! I just did some googling about multiverse analysis. I wasn’t aware of this term but loved it! I think I was already applying the same principle to other methodological aspects in my study, but I was calling it “sensitivity analysis”.

1 Like

A note of caution - those PP plots are not showing the response distribution, but rather a mixture of the responses over all the members of the group. So I wouldn’t overinterpret the plots looking “Poisson-y”. Similarly, saying that “the datat set as whole” follows some distribution is a bit problematic. What we see is whether the data are fit well by the whole model (including, but not limited to the response distribution).

I think it usually is a problem when only a subset of data is not fit well by the model and it might be useful to think why. In this case, I think the cause is related to the response distribution - as orders other than Galliformes basically have a lot of zeroes and much fewer ones and very little larger numbers. So the Poisson mean for the individual species is likely close to 0. For Galliformes, the mean for at least some species is probably around 1 and that brings the conflict around fitting the underdispersion (low number of both zeroes and twos).

You also see that Charadriiformes also aren’t fit that well.

This is because pp checks judge the whole model, not only the response distribution - the response distribution must be judged as conditional on all the covariates. I would also expect that if you broke down the figure in post #5 by families or similar, you would see additional discrepancies. Alternatively, it might also be possible that the genetic covariance for some reason doesn’t do a good job of modelling the interdependencies and introduces some bias (it could still lead to better predictive performance as evidenced by LOO)

Also a wild guess: a Poisson hurdle model (hurdle_poisson) might also be a good fit as it treats zeroes as completely separate from non-zeroes, so it should allow to have low number of zeroes (as in Galliformes) and a Poisson-like tail… (unlike zero-inflated poisson which - as the name suggests - can’t have fewer zeroes than Poisson).

1 Like