I have a continuous response variable between 0 and 1, with inflated portions on either end. My main task is to retrieve effect estimates of each predictor on the response variable with a zero- one inflated beta regression (ZOIB).
brms code
n_iter <- 10000
n_thin <- ifelse(n_iter > 5000, 10, 1)
zoi_beta_model <- brm(
bf(burned_fraction ~ (1 | site) + elevation + slope + northness +
eastness + tpi_500 + LST + NDVI_sd + NDVI),
data = model_data_scaled,
family = zero_one_inflated_beta(),
chains = 4,
iter = n_iter,
# warmup = 1000, # use default warmup (iter/2)
thin = n_thin,
cores = 4, seed = 1234,
file = paste0("zoi_beta_model",today())
)
Results
Family: zero_one_inflated_beta
Links: mu = logit; phi = identity; zoi = identity; coi = identity
Formula: burned_fraction ~ (1 | site) + elevation + slope + northness + eastness + tpi_500 + LST + NDVI_sd + NDVI
Data: model_data_scaled (Number of observations: 6944)
Draws: 4 chains, each with iter = 10000; warmup = 5000; thin = 10;
total post-warmup draws = 2000
Group-Level Effects:
~site (Number of levels: 6)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.75 0.36 0.36 1.69 1.00 1844 1719
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.54 0.32 -0.11 1.23 1.00 2018 1679
elevation 0.11 0.03 0.06 0.16 1.00 1846 2016
slope -0.30 0.03 -0.36 -0.24 1.00 1893 1823
northness 0.11 0.03 0.06 0.16 1.00 1891 1778
eastness -0.04 0.02 -0.08 0.01 1.00 2188 2058
tpi_500 0.04 0.03 -0.01 0.09 1.00 1827 1734
LST 0.53 0.04 0.46 0.60 1.00 1899 1739
NDVI_sd 0.01 0.03 -0.04 0.07 1.00 2152 2060
NDVI 0.06 0.02 0.01 0.11 1.00 1889 1822
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi 1.63 0.03 1.57 1.69 1.00 1997 1983
zoi 0.44 0.01 0.43 0.45 1.00 2040 1834
coi 0.79 0.01 0.78 0.81 1.00 1994 1916
Diagnostic plots
The posterior distribution (using pp_check()
) seems to indicate that the posterior draws can pick up the distribution of my response variable:
However, when I compare the average replicated values y_{rep} to the observed response variable, it seems to show a rather weak performance. The average y_{rep} seem to gather around 0.6. I am unsure whether the y_{rep} values are not incorporating the probability of an observation being either a zero or a one, nor the conditional probability of being a one given that it is either a zero or a one.
Question
The ZOIB information in brms is sparse and I reach out on this forum to seek your help, assuming that there are people who often have used ZOIB models.
- How could I best diagnose the performance of my ZOIB model? I’m thinking about a pseudo-R2 or “variance explained”.
- Are the
pp_check()
distributions an indicator of the model performance? - Are helper functions like
fitted()
orpredict()
including the models forcoi
andzoi
? - E.g., when I retrieve the fitted values, do they represent the final posterior prediction with zero-one inflation included or just the mean response of the
mu
andphi
model?