Discrepancy between y and yrep in pp_check: how bad is it?

To summarize: below would be the updated code.

#create data. 
#proportion of zero-one inflation (zoi) = 200/800 = 0.25 
#proportion of zoi that are ones (coi) = 100/200 = 0.5 
#mid-point inflation (mdi) = 100/800 = 0.125
#mean of the beta distribution is 0.5

library(brms)

y <- c(rep(0, 100), rbeta(500, 1, 1), rep(0.5, 100), rep(1, 100))
hist(y,breaks=50)
d1 <- cbind(y)
d1 <- data.frame(d1)
str(d1)


zero_one_mid_inflated_beta <- custom_family(
  "zero_one_mid_inflated_beta", dpars = c("mu", "phi", "zoi", "coi", "mdi"),
  links = c("logit", "identity", "identity", "identity", "identity"),
  lb = c(0, 0, 0, 0, 0), ub = c(1, NA, 1, 1, 1),
  type = "real"
)


stan_funs <- "
     real zero_one_mid_inflated_beta_lpdf(real y, real mu, real phi,
                                    real zoi, real coi, real mdi) {
     row_vector[2] shape = [mu * phi, (1 - mu) * phi]; 
     if (y == 0) { 
       return bernoulli_lpmf(1 | zoi) + bernoulli_lpmf(0 | coi); 
     } else if (y == 1) {
       return bernoulli_lpmf(1 | zoi) + bernoulli_lpmf(1 | coi);
     } else if (y == 0.5) {
       return bernoulli_lpmf(1 | mdi) + bernoulli_lpmf(0 | mdi);
     } else { 
       return bernoulli_lpmf(0 | zoi) + bernoulli_lpmf(0 | mdi) + beta_lpdf(y | shape[1], shape[2]);
     } 
   }
"

stanvars <- stanvar(scode = stan_funs, block = "functions")

m3 <- brm(y ~ 1, data=d1, family = zero_one_mid_inflated_beta, stanvars = stanvars, cores = 4)
m3


posterior_predict_zero_one_mid_inflated_beta <- function(i, prep, ...) {
  zoi <- get_dpar(prep, "zoi", i)
  coi <- get_dpar(prep, "coi", i)
  mdi <- get_dpar(prep, "mdi", i)
  mu <- get_dpar(prep, "mu", i = i)
  phi <- get_dpar(prep, "phi", i = i)
  hu <- runif(prep$ndraws, 0, 1)
  one_or_zero <- runif(prep$ndraws, 0, 1)
  mid <- runif(prep$ndraws, 0, 1)
  ifelse(hu < zoi & mid > mdi,
    ifelse(one_or_zero < coi, 1, 0),
    ifelse(mid < mdi, 0.5, rbeta(prep$ndraws, shape1 = mu * phi, shape2 = (1 - mu) * phi))
  )
}

pp_check(m3)

Model results:

Posterior predictive check:

The model over-estimates the mid-point inflation slightly, which can be seen in both the model results and the posterior predictive check.

I updated the posterior predict code. I think I had an ifelse statement in the wrong place. But it would be great if someone else could check it!

Also, I updated the model code, because as I reasoned here and I think confirmed by reading this, that the + beta_lpdf(y | shape[1], shape[2]) is not needed in the mid-point inflation line, because the probability of getting exactly 0.5 from a continuous distribution is zero.

Hopefully this is a good start if anyone is interested in this model. I think a zomi beta model could be useful for those working with slider scale data.

Hopefully some people can chime in to check and see if this is correct :-) Thanks @Solomon for checking it out on some data.

1 Like