To summarize: below would be the updated code.
#create data.
#proportion of zero-one inflation (zoi) = 200/800 = 0.25
#proportion of zoi that are ones (coi) = 100/200 = 0.5
#mid-point inflation (mdi) = 100/800 = 0.125
#mean of the beta distribution is 0.5
library(brms)
y <- c(rep(0, 100), rbeta(500, 1, 1), rep(0.5, 100), rep(1, 100))
hist(y,breaks=50)
d1 <- cbind(y)
d1 <- data.frame(d1)
str(d1)
zero_one_mid_inflated_beta <- custom_family(
"zero_one_mid_inflated_beta", dpars = c("mu", "phi", "zoi", "coi", "mdi"),
links = c("logit", "identity", "identity", "identity", "identity"),
lb = c(0, 0, 0, 0, 0), ub = c(1, NA, 1, 1, 1),
type = "real"
)
stan_funs <- "
real zero_one_mid_inflated_beta_lpdf(real y, real mu, real phi,
real zoi, real coi, real mdi) {
row_vector[2] shape = [mu * phi, (1 - mu) * phi];
if (y == 0) {
return bernoulli_lpmf(1 | zoi) + bernoulli_lpmf(0 | coi);
} else if (y == 1) {
return bernoulli_lpmf(1 | zoi) + bernoulli_lpmf(1 | coi);
} else if (y == 0.5) {
return bernoulli_lpmf(1 | mdi) + bernoulli_lpmf(0 | mdi);
} else {
return bernoulli_lpmf(0 | zoi) + bernoulli_lpmf(0 | mdi) + beta_lpdf(y | shape[1], shape[2]);
}
}
"
stanvars <- stanvar(scode = stan_funs, block = "functions")
m3 <- brm(y ~ 1, data=d1, family = zero_one_mid_inflated_beta, stanvars = stanvars, cores = 4)
m3
posterior_predict_zero_one_mid_inflated_beta <- function(i, prep, ...) {
zoi <- get_dpar(prep, "zoi", i)
coi <- get_dpar(prep, "coi", i)
mdi <- get_dpar(prep, "mdi", i)
mu <- get_dpar(prep, "mu", i = i)
phi <- get_dpar(prep, "phi", i = i)
hu <- runif(prep$ndraws, 0, 1)
one_or_zero <- runif(prep$ndraws, 0, 1)
mid <- runif(prep$ndraws, 0, 1)
ifelse(hu < zoi & mid > mdi,
ifelse(one_or_zero < coi, 1, 0),
ifelse(mid < mdi, 0.5, rbeta(prep$ndraws, shape1 = mu * phi, shape2 = (1 - mu) * phi))
)
}
pp_check(m3)
Model results:
Posterior predictive check:
The model over-estimates the mid-point inflation slightly, which can be seen in both the model results and the posterior predictive check.
I updated the posterior predict code. I think I had an ifelse statement in the wrong place. But it would be great if someone else could check it!
Also, I updated the model code, because as I reasoned here and I think confirmed by reading this, that the + beta_lpdf(y | shape[1], shape[2])
is not needed in the mid-point inflation line, because the probability of getting exactly 0.5 from a continuous distribution is zero.
Hopefully this is a good start if anyone is interested in this model. I think a zomi beta model could be useful for those working with slider scale data.
Hopefully some people can chime in to check and see if this is correct :-) Thanks @Solomon for checking it out on some data.