Discrepancy between y and yrep in pp_check: how bad is it?

Maybe something like this? I cobbled this together this morning…someone should definitely check it before using it, as this is a bit over my head as well. I post it here as an idea. It does OK on the simple simulated data, but seems to overestimate the zero-inflation and middle-inflation.

#create data. 
#proportion of zero-one inflation (zoi) = 200/800 = 0.25 
#proportion of zoi that are ones (coi) = 100/200 = 0.5 
#mid-point inflation (mdi) = 100/800 = 0.125
#mean of the beta distribution is 0.5

library(brms)

#define data
y <- c(rep(0, 100), rbeta(500, 1, 1), rep(0.5, 100), rep(1, 100))
hist(y,breaks=50)
d1 <- cbind(y)
d1 <- data.frame(d1)
str(d1)

#custom brms family
zero_one_mid_inflated_beta <- custom_family(
  "zero_one_mid_inflated_beta", dpars = c("mu", "phi", "zoi", "coi", "mdi"),
  links = c("logit", "identity", "identity", "identity", "identity"),
  lb = c(0, 0, 0, 0, 0), ub = c(1, NA, 1, 1, 1),
  type = "real"
)

#provide Stan functions
stan_funs <- "
     real zero_one_mid_inflated_beta_lpdf(real y, real mu, real phi,
                                    real zoi, real coi, real mdi) {
     row_vector[2] shape = [mu * phi, (1 - mu) * phi]; 
     if (y == 0) { 
       return bernoulli_lpmf(1 | zoi) + bernoulli_lpmf(0 | coi); 
     } else if (y == 1) {
       return bernoulli_lpmf(1 | zoi) + bernoulli_lpmf(1 | coi);
     } else if (y == 0.5) {
       return bernoulli_lpmf(1 | mdi) + bernoulli_lpmf(0 | mdi) + beta_lpdf(y | shape[1], shape[2]);
     } else { 
       return bernoulli_lpmf(0 | zoi) + bernoulli_lpmf(0 | mdi) + beta_lpdf(y | shape[1], shape[2]);
     } 
   }
"

#define stanvars
stanvars <- stanvar(scode = stan_funs, block = "functions")

#run model 
m2 <- brm(y ~ 1, data=d1, family = zero_one_mid_inflated_beta, stanvars = stanvars, cores = 4)
m2

The zoi, coi, and mdi all use identity link, so they are showing the estimated proportion. You can check this against what I programmed into the data, outlined above. Intercept and coi are good, but zoi and mdi are overestimated (the real value in the data is captured by the lower end of the CI), so I might have done something wrong. To use pp_check() you would need to define zero_one_mid_inflated_beta_rng in the stan_funs.

Note - The + bernoulli_lpmf(0 | mdi) + beta_lpdf(y | shape[1], shape[2]) can be taken out of the mid-inflation line, and the model runs and obtains the same results. Originally, I was thinking this part was like the zero-inflated negative binomial case where some zeroes can come from the negative binomial distribution… but since this isn’t integer, the probability of getting exactly 0.5 from beta distribution is actually zero, right? So, maybe it really is like the case of the zeroes and the ones.

Hopefully someone like @paul.buerkner can chime in and clarify.