Why is my zero-one-inflated beta regression showing good posterior distributions but very bad fitted values and predictions?

I have a continuous response variable between 0 and 1, with inflated portions on either end. My main task is to retrieve effect estimates of each predictor on the response variable with a zero- one inflated beta regression (ZOIB).

brms code

n_iter <- 10000
n_thin <- ifelse(n_iter > 5000, 10, 1)

zoi_beta_model <- brm(
  bf(burned_fraction ~ (1 | site) + elevation + slope + northness + 
    eastness + tpi_500 + LST + NDVI_sd + NDVI),
  data = model_data_scaled,
  family = zero_one_inflated_beta(),
  chains = 4, 
  iter = n_iter, 
  # warmup = 1000,        # use default warmup (iter/2)
  thin = n_thin,
  cores = 4, seed = 1234,
  file = paste0("zoi_beta_model",today())
) 

Results

Family: zero_one_inflated_beta 
  Links: mu = logit; phi = identity; zoi = identity; coi = identity 
Formula: burned_fraction ~ (1 | site) + elevation + slope + northness + eastness + tpi_500 + LST + NDVI_sd + NDVI 
   Data: model_data_scaled (Number of observations: 6944) 
  Draws: 4 chains, each with iter = 10000; warmup = 5000; thin = 10;
         total post-warmup draws = 2000

Group-Level Effects: 
~site (Number of levels: 6) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.75      0.36     0.36     1.69 1.00     1844     1719

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.54      0.32    -0.11     1.23 1.00     2018     1679
elevation     0.11      0.03     0.06     0.16 1.00     1846     2016
slope        -0.30      0.03    -0.36    -0.24 1.00     1893     1823
northness     0.11      0.03     0.06     0.16 1.00     1891     1778
eastness     -0.04      0.02    -0.08     0.01 1.00     2188     2058
tpi_500       0.04      0.03    -0.01     0.09 1.00     1827     1734
LST           0.53      0.04     0.46     0.60 1.00     1899     1739
NDVI_sd       0.01      0.03    -0.04     0.07 1.00     2152     2060
NDVI          0.06      0.02     0.01     0.11 1.00     1889     1822

Family Specific Parameters: 
    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi     1.63      0.03     1.57     1.69 1.00     1997     1983
zoi     0.44      0.01     0.43     0.45 1.00     2040     1834
coi     0.79      0.01     0.78     0.81 1.00     1994     1916

Diagnostic plots
The posterior distribution (using pp_check()) seems to indicate that the posterior draws can pick up the distribution of my response variable:

However, when I compare the average replicated values y_{rep} to the observed response variable, it seems to show a rather weak performance. The average y_{rep} seem to gather around 0.6. I am unsure whether the y_{rep} values are not incorporating the probability of an observation being either a zero or a one, nor the conditional probability of being a one given that it is either a zero or a one.

Question
The ZOIB information in brms is sparse and I reach out on this forum to seek your help, assuming that there are people who often have used ZOIB models.

  • How could I best diagnose the performance of my ZOIB model? I’m thinking about a pseudo-R2 or “variance explained”.
  • Are the pp_check() distributions an indicator of the model performance?
  • Are helper functions like fitted() or predict() including the models for coi and zoi?
  • E.g., when I retrieve the fitted values, do they represent the final posterior prediction with zero-one inflation included or just the mean response of the mu and phi model?
2 Likes