Discrepancy between y and yrep in pp_check: how bad is it?

jd_c · April 21, 2022, 4:38pm

Maybe something like this? I cobbled this together this morning…someone should definitely check it before using it, as this is a bit over my head as well. I post it here as an idea. It does OK on the simple simulated data, but seems to overestimate the zero-inflation and middle-inflation.

#create data. 
#proportion of zero-one inflation (zoi) = 200/800 = 0.25 
#proportion of zoi that are ones (coi) = 100/200 = 0.5 
#mid-point inflation (mdi) = 100/800 = 0.125
#mean of the beta distribution is 0.5

library(brms)

#define data
y <- c(rep(0, 100), rbeta(500, 1, 1), rep(0.5, 100), rep(1, 100))
hist(y,breaks=50)
d1 <- cbind(y)
d1 <- data.frame(d1)
str(d1)

#custom brms family
zero_one_mid_inflated_beta <- custom_family(
  "zero_one_mid_inflated_beta", dpars = c("mu", "phi", "zoi", "coi", "mdi"),
  links = c("logit", "identity", "identity", "identity", "identity"),
  lb = c(0, 0, 0, 0, 0), ub = c(1, NA, 1, 1, 1),
  type = "real"
)

#provide Stan functions
stan_funs <- "
     real zero_one_mid_inflated_beta_lpdf(real y, real mu, real phi,
                                    real zoi, real coi, real mdi) {
     row_vector[2] shape = [mu * phi, (1 - mu) * phi]; 
     if (y == 0) { 
       return bernoulli_lpmf(1 | zoi) + bernoulli_lpmf(0 | coi); 
     } else if (y == 1) {
       return bernoulli_lpmf(1 | zoi) + bernoulli_lpmf(1 | coi);
     } else if (y == 0.5) {
       return bernoulli_lpmf(1 | mdi) + bernoulli_lpmf(0 | mdi) + beta_lpdf(y | shape[1], shape[2]);
     } else { 
       return bernoulli_lpmf(0 | zoi) + bernoulli_lpmf(0 | mdi) + beta_lpdf(y | shape[1], shape[2]);
     } 
   }
"

#define stanvars
stanvars <- stanvar(scode = stan_funs, block = "functions")

#run model 
m2 <- brm(y ~ 1, data=d1, family = zero_one_mid_inflated_beta, stanvars = stanvars, cores = 4)
m2

The zoi, coi, and mdi all use identity link, so they are showing the estimated proportion. You can check this against what I programmed into the data, outlined above. Intercept and coi are good, but zoi and mdi are overestimated (the real value in the data is captured by the lower end of the CI), so I might have done something wrong. To use pp_check() you would need to define zero_one_mid_inflated_beta_rng in the stan_funs.

Note - The + bernoulli_lpmf(0 | mdi) + beta_lpdf(y | shape[1], shape[2]) can be taken out of the mid-inflation line, and the model runs and obtains the same results. Originally, I was thinking this part was like the zero-inflated negative binomial case where some zeroes can come from the negative binomial distribution… but since this isn’t integer, the probability of getting exactly 0.5 from beta distribution is actually zero, right? So, maybe it really is like the case of the zeroes and the ones.

Hopefully someone like @paul.buerkner can chime in and clarify.

Topic		Replies	Views
Plot doesn't look good from pp_check() in brms Modeling brms	3	1084	January 27, 2023
Improving model fit with zero_one_inflated_beta with a specific case study Modeling brms	4	686	August 24, 2022
Assessing Bayesian Beta Regression fit using pp_check Modeling fitting-issues , brms	5	80	April 8, 2025
Brms predictions inconsistent with custom model (likely user error) General	7	405	September 13, 2023
Discrepancy in Manually- vs Automatically-Generated pp_check() Figures brms techniques , specification	2	466	August 25, 2020

Discrepancy between y and yrep in pp_check: how bad is it?

Related topics