Zero and one inflated beta regression - posterior predictive checks by component (π1, π0, μ)

I’ve built a zero and one inflated beta regression of linguistic alignment data (on a 0-1 scale, from no word re-use to exact repetition).
Zero inflation is π0, one inflation is π1, and the mean of the beta is μ.

Conceptually, I am interested in

  • exact repetitions between interlocutors (π1),
  • alignment rate (1- π0), or propensity to reuse the other person’s words
  • alignment level (μ), or amount of words reuse when there is indeed alignment

I’m trying to generate meaningful plots of the posterior estimates and how well they describe the data. I used pp-check and it looks good, but I’d like a more granular perspective on the components of the model.

I tried

PredsZOI_LA <- posterior_predict(model, dpar="zoi")[1,]
PredsCOI_LA <- posterior_predict(model, dpar="coi")[1,]
LexicalAlignment <- posterior_predict(model, dpar="mu")[1,]
### Unconditioning the inflation
LexicalRate <- 1 - (PredsZOI_LA * (1-PredsCOI_LA))
LexicalRepetitions <- PredsCOI_LA * PredsZOI_LA

But it gives me weird results (bad fit to the data)
image

However, predictive checks (via pp_check) and the model estimates show good fit. For instance, if I manually extract the coefficients of the model and generate predictions (both for population level mean, and for the group level distribution), I get a much nicer fit:

So, I must be misunderstanding the way posterior_predict works. Any suggestions?

1 Like

Hi,
a bit short on time, so jus a quick note:

I don’t think posterior_predict accepts a dpar parameter. If you want predictions for individual parameters, I think (can’t check now) that you need posterior_linpred or posterior_epred (which differ primarily in whether the link function is applied).

Also indexing ([1,]) looks suspicious. AFAIK posterior_predict will give you a matrix with the dimension number of samples x rows of dataset x. So if you are trying to predict only for a single participant I think you need [,1]? Generally, what you should usualy be doing is computing the quantity of interest separately for each sample which then gives you samples representing the posterior distribution of the quantity of interest.

Best of luck!

2 Likes

somehow my code had drifted from posterior_epred to posterior_predict(). Probably I was trying to better include the variance (phi). I now shifted it back and it solved the issue. Thanks.
( the [1, ] was just to simplify the processing and memory usage, lookng only at one sample, I use n samples in the actual plot).

1 Like