Hi, I am trying to run a zero-one-inflated beta mixed model with my study.
Rational for using this model: The DV is a bounded variable signifying the likelihood a person is going to look to location A (0) or location B (1). The response was given on a slider on multiple trials, and most of the responses fell on or around 1 but there is a small (although present) peak on the 0 as well. As this variable is bounded and J shaped, I recoded the extremes to be e-5 and 1-(e-5). The IVs are categorical and represent the membership to two (gender and political) groups and the levels of dishonesty presented in the scenario.
First Question: Do you think it is an appropriate model (I am not a statistician)?
Secondly, when I run the model something weird happens. This is the model I run (I am using a MacBook Air).
zoib_model <- bf(
Newscore ~ honesty_number*Political_group_dummy*Gender_group+
(1|number_participant) + (1|Scenario_Number),
phi ~ honesty_number*Political_group_dummy*Gender_group+
(1|number_participant) + (1|Scenario_Number),
zoi ~ honesty_number*Political_group_dummy*Gender_group+
(1|number_participant) + (1|Scenario_Number),
coi ~ honesty_number*Political_group_dummy*Gender_group+
(1|number_participant) + (1|Scenario_Number),
family = zero_one_inflated_beta()
)
fit <- brm(
formula = zoib_model,
data = total_file_HighICS,
cores = 4,
thin = 4,
iter = 4000
)
And this is what runs, until its stops.
"Compiling Stan program...
Start sampling
starting worker pid=42318 on localhost:11174 at 16:47:59.584
starting worker pid=42333 on localhost:11174 at 16:47:59.885
starting worker pid=42347 on localhost:11174 at 16:48:00.165
starting worker pid=42361 on localhost:11174 at 16:48:00.488
SAMPLING FOR MODEL '884207f16b9ac795e3a747f16199f39e' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 0.015805 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 158.05 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 4000 [ 0%] (Warmup)
SAMPLING FOR MODEL '884207f16b9ac795e3a747f16199f39e' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 0.015603 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 156.03 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 4000 [ 0%] (Warmup)
SAMPLING FOR MODEL '884207f16b9ac795e3a747f16199f39e' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 0.016704 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 167.04 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 4000 [ 0%] (Warmup)
SAMPLING FOR MODEL '884207f16b9ac795e3a747f16199f39e' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 0.017622 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 176.22 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:
Chain 4:
Chain 4: Iteration: 1 / 4000 [ 0%] (Warmup)"
Am I doing something wrong? Would assigning different priors change something?
Final question:
I wanted two add priors to the interactions, and wanted to model them after the pilot data. They are all beta destributed (as expected). Do you think it is best to use beta distributions (beta(alpha, beta)) to set priors or have them normal or student t?
If you are using a zero-one-inflated beta model, why did you transform the response so that it was only close to zero or one?
Your model is extremely flexible. Whether or not it is appropriate is difficult for anyone to say, as that depends on your research question, background knowledge, assumptions of the study, etc. I would say it is a very flexible model, though, and that could pose challenges to fitting it, but that doesn’t mean that it is necessarily inappropriate. However, you might want to start much simpler and work your way up to more complex models. With each model built, you can check your model with posterior predictive checks of different kinds to visually inspect different assumptions that you have built into the model and make adjustments accordingly.
Do you get some sort of warning? What do you mean by “stops”? I can’t see anything amiss in the sampling output that you show, other than that this model looks like it could possibly take a long time to run! If that is the case, it could appear frozen when it is simply working.
Assigning different priors changes the model and it’s assumptions. It looks like you left the default brms priors on, which are flat priors for the interaction coefficients, and some sort of half-student-t priors on the standard deviations for varying intercepts. The best way to set priors is via prior predictive checking. You can implement these in brms by running your model with sample_prior="only" option in the brm call, which will sample from the priors and ignore the likelihood. The graphical tools that come with brms for posterior predictive checking can be used for prior predictive checks on that fit.
Priors for the interactions will be set on the coefficients for the linear predictors, which in the case of your model of the response variable are on the logit scale, model for phi on the log scale, and model for zoi and coi on the logit scale. Personally, when working with the logit scale, I often use normal priors, as large values on the logit scale are not so likely. For example, something like normal(0, 2.5) is usually pretty wide. But you would want to use domain knowledge and prior predictive checks to set appropriate priors.
The model does not stop running without any interruptions. But as you can see, the progress remains at 0% for the first 3 chains and never moves beyond 0% at the fourth one. Not even after hours! That has never happened to me before, and using other families, the program runs just fine. there seems to be a problem with the one inflated part, because also one-inflated regression don’t run. However, here is no warning nor error message.
Your model is extremely flexible, with multiple varying intercepts for the regression model and models of the phi, coi, and zoi parameters. A model this complicated can be difficult to estimate without a lot of data and/or informative priors.
Yes, you have specified a zero-one-inflated model, but you have removed all of the ones in the data by recoding them to 0.9999 and removed all of the zeroes by recoding them to 0.00001.