Trying to fit a model for zero-inflated distribution

I am trying to fit a brms model to data which shows a zero-inflated bimodal distribution, namely the participants’ eye fixation duration within a certain region of interest which can be anything from 0 to 4.6 seconds. My independent variable is the pitch which accompanies the trial (2 different pitch levels) and the random terms for study participant and object in the trial.


The following is the model I have been trying to fit, after I found that the gaussian()-model might not be the best fit:

model <- brm(fixation_duration ~ 1 + pitch_fac
                                                 + (1 + pitch_fac | subject)
                                                 + (1 | object),
                                                 family = hurdle_gamma(link = 'log'), 
                                                 warmup = 1000, 
                                                 iter = 2000,
                                                 data = total_fix_duration_eyes_per_trial %>% filter(group == 0),
                                                 cores = 2)

However, the ppcheck shows this is not an optimal fit, either:

Bildschirmfoto 2021-09-19 um 10.28.39

I am not sure how to best improve the fit (i.e. which family and link to choose) and my modelling skills are limited. So, if any of you have encountered a similar situation or know how to best approach this, I would be glad to get some tipps.

I usually start pretty basic with any model, do the default priors make sense given the data, prior knowledge, and model you chose? I think that should be get_priors in brms? If those make sense, does simulated data return the known parameters?

Thank you for your answer, Ara_Winter. I checked whether the default priors made sense, and found that they did not. However, the gamma distribution was not a suitable distribution to begin with. I changed the model to be a zero_inflated_beta family model. For this, I transformed my values to be proportional in order to end up with values from 0 and 1. Using this distribution has the advantage that the non-zero-peak can be modelled anywhere between 0 and 1. Also, I changed the priors the model comes with. I chose priors that allow for great variation, because the data I will enter can have peaks in different locations.

1 Like