# Calculating predicted probabilities from a binomial (bernoulli) model

I am working with a relatively small data set where the presence of a response is binary coded (1 = present). The design was nested such that there were unique participants within unique groups within 2 conditions.

My formula is:

``````CR_fmla <- bf(CR_Present ~
Training_Lev +
Category +
Category:Training_Lev +
(1 | Training_Lev / Group / Grp_Spkr))
``````

I have set family to “bernoulli”

I am trying to create some visualizations of the fitted model by showing the predicted probabilities of the response being present (1).

Is `add_fitted_draws` the correct function for this? Is the returned `.value` the predicted probability?

A box plot of the data:

“CR” is the response I’m interested in.

When I plot the fitted model with 'add_fitted_draws` as follows:

``````Data %>%
select(-Speaker) %>%
distinct() %>%
Mod,
allow_new_levels = TRUE,
value = "Probability"
)
``````

I get the following figure:

In this figure it looks as though CR being present is more likely in the postgrads in the IF category.

A `pp_check` seems okay:

To try a different approach, I calculated the proportions of CR present responses for individuals and groups and then fitted a `zero_one_inflated_beta` model and plotted the fitted proportions, again using `add_fitted_draws`, which resulted in the following figure:

The pp_check for the beta model does not look great:

Are these the right pp_checks to be examining?

Is there something else I should be doing to manually calculate posterior probabilities?

Due to collapsing the data to conduct the beta regression on proportions, I’m not able to subject the two models to a `loo_compare` since the number of observations is different.

Hi JLC,

have you tried `brms::conditional_effects()`? If I understand you correctly, it might do what you are looking for.

Otherwise, for more plotting options and model checks, have a look at e.g. this tutorial by Rens van der Schoot . For example, under 5.5 “model evaluation”, he explains how to use classification rate for model correctness in binary-response models.

Another thing that you could look at to see whether the model gets the overall proportion right is `pp_check(fit, type = “stat”, stat = “mean”).

yes it is.

Is this a problem? if so, can you elaborate: Is this not what you find descriptively in the raw data or contrary to what you would expect? From the box-plot it is difficult for me to tell as there are both, yes and no responses plotted separately per group and the groups (postgrad/undergrad) seem to be different in size?

Personally, I think I would not aggregate the data to the proportion level and fit a `zero_one_inflated_beta` model, and instead, stick with the bernoulli model that directly fits the responses, or maybe fit a `zero_one_inflated_binomial` if there are some problems with the `bernoulli` that could be alleviated by fitting zero-one inflation.

I hope this helps.

Best,
Julian

1 Like

Thank you very much @julianquandt!

I didn’t now about `brms::conditional_effects()` or that particular tutorial from Rens (I have studied his other tutorials). Both were very helpful!

I agree that collapsing into proportions is a poor choice, I provided that approach to satisfy a collaborator, but absolutely agree that a binomial/bernoulli model of the raw data is the better choice (and a better fit).

I also fitted some zero inflated negative binomial and various Poisson models to explore a few alternatives. Ultimately the bernoulli model was a “significantly” better fit using PSIS-LOO.

Thank you very much for the input and guidance!

1 Like