Calculating predicted probabilities from a binomial (bernoulli) model

I am working with a relatively small data set where the presence of a response is binary coded (1 = present). The design was nested such that there were unique participants within unique groups within 2 conditions.

My formula is:

CR_fmla <- bf(CR_Present ~ 
                Training_Lev +
                Category +
                Category:Training_Lev +
                (1 | Training_Lev / Group / Grp_Spkr))

I have set family to “bernoulli”

I am trying to create some visualizations of the fitted model by showing the predicted probabilities of the response being present (1).

Is add_fitted_draws the correct function for this? Is the returned .value the predicted probability?

A box plot of the data:

“CR” is the response I’m interested in.

When I plot the fitted model with 'add_fitted_draws` as follows:

Data %>%
  select(-Speaker) %>%
  distinct() %>%
  add_fitted_draws(
    Mod,
    allow_new_levels = TRUE,
    value = "Probability"
  )

I get the following figure:

In this figure it looks as though CR being present is more likely in the postgrads in the IF category.

A pp_check seems okay:

To try a different approach, I calculated the proportions of CR present responses for individuals and groups and then fitted a zero_one_inflated_beta model and plotted the fitted proportions, again using add_fitted_draws, which resulted in the following figure:

The pp_check for the beta model does not look great:

Are these the right pp_checks to be examining?

Is there something else I should be doing to manually calculate posterior probabilities?

Due to collapsing the data to conduct the beta regression on proportions, I’m not able to subject the two models to a loo_compare since the number of observations is different.

Hi JLC,

have you tried brms::conditional_effects()? If I understand you correctly, it might do what you are looking for.

Otherwise, for more plotting options and model checks, have a look at e.g. this tutorial by Rens van der Schoot . For example, under 5.5 “model evaluation”, he explains how to use classification rate for model correctness in binary-response models.

Another thing that you could look at to see whether the model gets the overall proportion right is `pp_check(fit, type = “stat”, stat = “mean”).

yes it is.

Is this a problem? if so, can you elaborate: Is this not what you find descriptively in the raw data or contrary to what you would expect? From the box-plot it is difficult for me to tell as there are both, yes and no responses plotted separately per group and the groups (postgrad/undergrad) seem to be different in size?

Personally, I think I would not aggregate the data to the proportion level and fit a zero_one_inflated_beta model, and instead, stick with the bernoulli model that directly fits the responses, or maybe fit a zero_one_inflated_binomial if there are some problems with the bernoulli that could be alleviated by fitting zero-one inflation.

I hope this helps.

Best,
Julian

1 Like

Thank you very much @julianquandt!

I didn’t now about brms::conditional_effects() or that particular tutorial from Rens (I have studied his other tutorials). Both were very helpful!

I agree that collapsing into proportions is a poor choice, I provided that approach to satisfy a collaborator, but absolutely agree that a binomial/bernoulli model of the raw data is the better choice (and a better fit).

I also fitted some zero inflated negative binomial and various Poisson models to explore a few alternatives. Ultimately the bernoulli model was a “significantly” better fit using PSIS-LOO.

Thank you very much for the input and guidance!

1 Like