I’m trying to model a response variable that ranges between 0 and 1 (preference for a habitat) and because the large number of 0’s and 1’s I’m using the family function zero_one_inflated_beta. However, I’m not very happy when I check the posterior predictive checks (maybe overthinking?) and I would like to know if there are other better alternatives to improve the performance of the modeling process (the model is a bit more complex but I simplified it for the purpose of the example).
Example_data.csv (4.3 KB)
library(tidyverse) #For readr and ggplot2
library(brms)
d = read_csv("Example_data.csv")
model1 = brm(Response ~ Variable1 * Habitat,
data = d, family=zero_one_inflated_beta())
pp_check(model1, nsamples=100) + ylim(0,10)
Now I repeat the process and I try to set more informative priors based on my knowledge of the variables (I’m new to specify priors in brms so I do apologize if I’m messing up here). The values for the response as I said above range from 0 to 1 and for the predictor from 0 to 10 in the natural world but in my dataset the maximum is just a bit over 6.
prior1 <- c(set_prior("normal(0,1)", class = "b", coef = "HabitatB"),
set_prior("normal(0,1)", class = "b", coef = "HabitatC"),
set_prior("normal(0,1)", class = "b", coef = "HabitatD"),
set_prior("normal(0,10)", class = "b", coef = "Variable1"))
model1 = brm(Response ~ Variable1 * Habitat,
data = d, prior=prior1, family=zero_one_inflated_beta())
Now I conduct posterior predictive checks.
pp_check(model1,nsamples=100)
Difficult to see anything here so I adjust the limits to improve visualization.
pp_check(model1,nsamples=100) + ylim(0,10)
I also I have increased the iterations (e.g., warmup = 1000, iter = 3000) but still getting similar results when I conduct the posterior predictive checks. Therefore, is this “sufficiently good” to “trust” the modelling process or can be further improved in order to have a better fit? Thanks in advance, any advice would be more than welcome!