Posterior predictive check for zero inflated interval censored Beta

I have used brms to generate Stan code for a zero inflated Beta regression with interval censoring. I then used the example code provided by Hans Van Calster in this feature request, to modify the way the model handles censoring.

I used that model on data of leaf damage (10491 observations, zero or interval censored: 0, (0, .05], (.05, .25] or (.25, 1), clustered by taxon and location).

Stan code

// generated with {brms} 2.22.8 **and modified**
functions {
  real zero_inflated_beta_lpdf(real y, real mu, real phi, real zi) {
    row_vector[2] shape = [mu * phi, (1 - mu) * phi];
    if (y == 0) {
      return bernoulli_lpmf(1 | zi);
    } else {
      return bernoulli_lpmf(0 | zi) +
             beta_lpdf(y | shape[1], shape[2]);
    }
  }
}
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  
///////////////////////////////////////////////////////////////////////////////////////

  // censoring indicator: 0 = event, 1 = right, -1 = left, 2 = interval censored
  array[N] int<lower=-1,upper=2> cens;
  // right censor points for interval censoring
  vector[N] rcens;
  
///////////////////////////////////////////////////////////////////////////////////////

  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  int<lower=1> Kc;  // number of population-level effects after centering
  int<lower=1> K_zi;  // number of population-level effects
  matrix[N, K_zi] X_zi;  // population-level design matrix
  int<lower=1> Kc_zi;  // number of population-level effects after centering
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  array[N] int<lower=1> J_1;  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  // data for group-level effects of ID 2
  int<lower=1> N_2;  // number of grouping levels
  int<lower=1> M_2;  // number of coefficients per level
  array[N] int<lower=1> J_2;  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_2_1;
  // data for group-level effects of ID 3
  int<lower=1> N_3;  // number of grouping levels
  int<lower=1> M_3;  // number of coefficients per level
  array[N] int<lower=1> J_3;  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_3_1;
  // data for group-level effects of ID 4
  int<lower=1> N_4;  // number of grouping levels
  int<lower=1> M_4;  // number of coefficients per level
  array[N] int<lower=1> J_4;  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_4_zi_1;
  // data for group-level effects of ID 5
  int<lower=1> N_5;  // number of grouping levels
  int<lower=1> M_5;  // number of coefficients per level
  array[N] int<lower=1> J_5;  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_5_zi_1;
  // data for group-level effects of ID 6
  int<lower=1> N_6;  // number of grouping levels
  int<lower=1> M_6;  // number of coefficients per level
  array[N] int<lower=1> J_6;  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_6_zi_1;
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  matrix[N, Kc_zi] Xc_zi;  // centered version of X_zi without an intercept
  vector[Kc_zi] means_X_zi;  // column means of X_zi before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
  for (i in 2:K_zi) {
    means_X_zi[i - 1] = mean(X_zi[, i]);
    Xc_zi[, i - 1] = X_zi[, i] - means_X_zi[i - 1];
  }
  
///////////////////////////////////////////////////////////////////////////////////////

    // censoring indices
  int Nevent = 0;
  int Nrcens = 0;
  int Nlcens = 0;
  int Nicens = 0;
  array[N] int Jevent;
  array[N] int Jrcens;
  array[N] int Jlcens;
  array[N] int Jicens;
  for (n in 1:N) {
    if (cens[n] == 0) {
      Nevent += 1;
      Jevent[Nevent] = n;
    } else if (cens[n] == 1) {
      Nrcens += 1;
      Jrcens[Nrcens] = n;
    } else if (cens[n] == -1) {
      Nlcens += 1;
      Jlcens[Nlcens] = n;
    } else if (cens[n] == 2) {
      Nicens += 1;
      Jicens[Nicens] = n;
    }
  }
///////////////////////////////////////////////////////////////////////////////////////  
}
parameters {
  vector[Kc] b;  // regression coefficients
  real Intercept;  // temporary intercept for centered predictors
  real<lower=0> phi;  // precision parameter
  vector[Kc_zi] b_zi;  // regression coefficients
  real Intercept_zi;  // temporary intercept for centered predictors
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  array[M_1] vector[N_1] z_1;  // standardized group-level effects
  vector<lower=0>[M_2] sd_2;  // group-level standard deviations
  array[M_2] vector[N_2] z_2;  // standardized group-level effects
  vector<lower=0>[M_3] sd_3;  // group-level standard deviations
  array[M_3] vector[N_3] z_3;  // standardized group-level effects
  vector<lower=0>[M_4] sd_4;  // group-level standard deviations
  array[M_4] vector[N_4] z_4;  // standardized group-level effects
  vector<lower=0>[M_5] sd_5;  // group-level standard deviations
  array[M_5] vector[N_5] z_5;  // standardized group-level effects
  vector<lower=0>[M_6] sd_6;  // group-level standard deviations
  array[M_6] vector[N_6] z_6;  // standardized group-level effects
  
///////////////////////////////////////////////////////////////////////////////////////

    // latent imputed values for censored observations
  vector<lower=Y[Jrcens[1:Nrcens]], upper=1>[Nrcens] Yright;
  vector<lower=0, upper=Y[Jlcens[1:Nlcens]]>[Nlcens] Yleft;
  vector<lower=Y[Jicens[1:Nicens]], upper=rcens[Jicens[1:Nicens]]>[Nicens] Yint;
  
///////////////////////////////////////////////////////////////////////////////////////  
}
transformed parameters {
  vector[N_1] r_1_1;  // actual group-level effects
  vector[N_2] r_2_1;  // actual group-level effects
  vector[N_3] r_3_1;  // actual group-level effects
  vector[N_4] r_4_zi_1;  // actual group-level effects
  vector[N_5] r_5_zi_1;  // actual group-level effects
  vector[N_6] r_6_zi_1;  // actual group-level effects
  real lprior = 0;  // prior contributions to the log posterior
  r_1_1 = (sd_1[1] * (z_1[1]));
  r_2_1 = (sd_2[1] * (z_2[1]));
  r_3_1 = (sd_3[1] * (z_3[1]));
  r_4_zi_1 = (sd_4[1] * (z_4[1]));
  r_5_zi_1 = (sd_5[1] * (z_5[1]));
  r_6_zi_1 = (sd_6[1] * (z_6[1]));
  lprior += student_t_lpdf(Intercept | 3, 0, 2.5);
  lprior += gamma_lpdf(phi | 0.01, 0.01);
  lprior += logistic_lpdf(Intercept_zi | 0, 1);
  lprior += student_t_lpdf(sd_1 | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
  lprior += student_t_lpdf(sd_2 | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
  lprior += student_t_lpdf(sd_3 | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
  lprior += student_t_lpdf(sd_4 | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
  lprior += student_t_lpdf(sd_5 | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
  lprior += student_t_lpdf(sd_6 | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] zi = rep_vector(0.0, N);
    mu += Intercept + Xc * b;
    zi += Intercept_zi + Xc_zi * b_zi;
    for (n in 1:N) {
      // add more terms to the linear predictor
      mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n] + r_3_1[J_3[n]] * Z_3_1[n];
    }
    for (n in 1:N) {
      // add more terms to the linear predictor
      zi[n] += r_4_zi_1[J_4[n]] * Z_4_zi_1[n] + r_5_zi_1[J_5[n]] * Z_5_zi_1[n] + r_6_zi_1[J_6[n]] * Z_6_zi_1[n];
    }
    mu = inv_logit(mu);
    zi = inv_logit(zi);
    
///////////////////////////////////////////////////////////////////////////////////////

    // Uncensored data
    for (i in 1:Nevent) {
      int n = Jevent[i];
      target += zero_inflated_beta_lpdf(Y[n] | mu[n], phi, zi[n]);
    }
    // Right-censored
    for (i in 1:Nrcens) {
      int n = Jrcens[i];
      target += zero_inflated_beta_lpdf(Yright[i] | mu[n], phi, zi[n]);
    }
    // Left-censored
    for (i in 1:Nlcens) {
      int n = Jlcens[i];
      target += zero_inflated_beta_lpdf(Yleft[i] | mu[n], phi, zi[n]);
    }
    // Interval-censored
    for (i in 1: Nicens) {
      int n = Jicens[i];
      target += zero_inflated_beta_lpdf(Yint[i] | mu[n], phi, zi[n]);
    }
  }
///////////////////////////////////////////////////////////////////////////////////////  

  // priors including constants
  target += lprior;
  target += std_normal_lpdf(z_1[1]);
  target += std_normal_lpdf(z_2[1]);
  target += std_normal_lpdf(z_3[1]);
  target += std_normal_lpdf(z_4[1]);
  target += std_normal_lpdf(z_5[1]);
  target += std_normal_lpdf(z_6[1]);
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
  // actual population-level intercept
  real b_zi_Intercept = Intercept_zi - dot_product(means_X_zi, b_zi);
}

The reason for the modification was that the brms model was really struggling to sample. The modified model finished sampling fast (considering the amount of data and model complexity), without issues and had good Rhats and Bulk_ESS values. Some more details here.

I am not sure what would be a good way to do a posterior predictive check for such a model. What I have tried is to generate predictions, impose on them the same interval censoring scheme and compare.

ppc = function(fit, data, ndraws = 10) {
  
  observed = case_when(data$invert_damage.perc == 0  ~ "0",
                       data$invert_damage.perc < 6   ~ "0.01–0.05",
                       data$invert_damage.perc < 26  ~ "0.05–0.26",
                       data$invert_damage.perc >= 26 ~ "0.26–0.99")
  
  ypred = posterior_predict(fit, ndraws = ndraws)
  
  bins = function(x) {
    case_when(x == 0   ~ "0",
              x < 0.06 ~ "0.01–0.05",
              x < 0.26 ~ "0.05–0.26",
              x >= 0.26 ~ "0.26–0.99")
  }
  
  sim_bins = apply(ypred, 1, function(x) table(factor(bins(x),
                                                      levels = c("0", 
                                                                 "0.01–0.05", 
                                                                 "0.05–0.26", 
                                                                 "0.26–0.99"))))
  sim_df = as.data.frame(t(sim_bins))
  sim_df$draw = 1:nrow(sim_df)
  
  obs_counts = table(factor(observed, 
                            levels = c("0", 
                                       "0.01–0.05", 
                                       "0.05–0.26", 
                                       "0.26–0.99")))
  
  obs_df = data.frame(Category = names(obs_counts),
                      Observed = as.numeric(obs_counts))
  
  sim_summary = sim_df %>%
    pivot_longer(cols = -draw, 
                 names_to = "Category", 
                 values_to = "Simulated") %>%
    group_by(Category) %>%
    summarise(mean = mean(Simulated),
              lower = quantile(Simulated, 0.025),
              upper = quantile(Simulated, 0.975))
  
  plot_df = left_join(obs_df, sim_summary, by = "Category")
  
  ggplot(plot_df, aes(x = Category)) +
    geom_col(aes(y = Observed), fill = "#a2c0d9", alpha = 0.7) +
    geom_pointrange(aes(y = mean, ymin = lower, ymax = upper), 
                    color = "#152736", size = .25, fatten = 2) +
    labs(y = "Count", x = "Damage category",
         subtitle = "bars = y, pointrange = yrep mean & 95% CI") +
    theme_minimal()
}

Do you think this is a valid approach to evaluate the model? The result is not particularly encouraging. I have also used a cumulative probit model (four levels, same predictors), which seemed to do a much better job:

Hi! Good question. Not sure if I got it right, but below is a reprex which is based on your censoring scheme, except that I did not include the zero-inflation. The simulation shows that the model is able to recover mu and phi parameter and the proportion of data that are in each censoring class. Perhaps that is a useful starting point for your PPCs?

To reproduce you also need the bda.stan file.
bda.stan (3.6 KB)

library(tidyverse)
library(brms)
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.22.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#> 
#> Attaching package: 'brms'
#> The following object is masked from 'package:stats':
#> 
#>     ar

## generate data from beta distribution
set.seed(2154)
mu <- 0.1
phi <- 2
true_data <- tibble(
  y = rbeta(1000, shape1 = mu * phi, shape2 = (1 - mu) * phi)
)

# histogram of latent (unobserved) true values
hist(true_data$y)


# Censoring
censored_data <- true_data %>%
  mutate(
    y_left = case_when(
      y <= 0.05 ~ 1e-7,
      y > 0.05 & y <= 0.25 ~ 0.05,
      y > 0.25 & y <= 1 ~ 0.25
    ),
    y_right = case_when(
      y <= 0.05 ~ 0.05,
      y > 0.05 & y <= 0.25 ~ 0.25,
      y > 0.25 & y <= 1 ~ 1 - 1e-7
    ),
    censoring = case_when(
      y_right == 0.05 ~ "left",
      y_left == 0.25 ~ "right",
      TRUE ~ "interval"
    ),
    y_cens = case_when(
      censoring == "left" ~ y_right,
      censoring == "right" ~ y_left,
      TRUE ~ y_left
    )
  )

beta_stan_data <- brms::make_standata(
  y_cens | cens(censoring, y_right) ~ 1,
  family = brms::Beta(),
  data = censored_data)

beta_stan_code <- brms::make_stancode(
  y_cens | cens(censoring, y_right) ~ 1,
  family = brms::Beta(),
  data = censored_data)

augmented_not_vectorized <- cmdstanr::cmdstan_model(
  "bda.stan" # this is data augmentation model, not vectorized
)

# fit models
augmented_not_vectorized <- augmented_not_vectorized$sample(
  data = beta_stan_data,
  seed = 123,
  chains = 4,
  parallel_chains = 4
)
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 1 Exception: gamma_lpdf: Random variable is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpKmzDnp/model-3ff8c2d74e9.stan', line 97, column 2 to column 41)
#> Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 1
#> Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 1 Exception: gamma_lpdf: Random variable is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpKmzDnp/model-3ff8c2d74e9.stan', line 97, column 2 to column 41)
#> Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 1
#> Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 1 Exception: gamma_lpdf: Random variable is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpKmzDnp/model-3ff8c2d74e9.stan', line 97, column 2 to column 41)
#> Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 1
#> Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 1 Exception: gamma_lpdf: Random variable is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpKmzDnp/model-3ff8c2d74e9.stan', line 97, column 2 to column 41)
#> Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 1
#> Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 1 Exception: gamma_lpdf: Random variable is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpKmzDnp/model-3ff8c2d74e9.stan', line 97, column 2 to column 41)
#> Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 1
#> Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 2 Exception: beta_lpdf: First shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpKmzDnp/model-3ff8c2d74e9.stan', line 113, column 6 to column 78)
#> Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 2
#> Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 2 Exception: beta_lpdf: First shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpKmzDnp/model-3ff8c2d74e9.stan', line 113, column 6 to column 78)
#> Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 2
#> Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 2 Exception: gamma_lpdf: Random variable is inf, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpKmzDnp/model-3ff8c2d74e9.stan', line 97, column 2 to column 41)
#> Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 2
#> Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 3 Exception: beta_lpdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpKmzDnp/model-3ff8c2d74e9.stan', line 113, column 6 to column 78)
#> Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 3
#> Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 3 Exception: beta_lpdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpKmzDnp/model-3ff8c2d74e9.stan', line 113, column 6 to column 78)
#> Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 3
#> Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 3 Exception: beta_lpdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpKmzDnp/model-3ff8c2d74e9.stan', line 113, column 6 to column 78)
#> Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 3
#> Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 3 Exception: beta_lpdf: First shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpKmzDnp/model-3ff8c2d74e9.stan', line 113, column 6 to column 78)
#> Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 3
#> Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 4 Exception: gamma_lpdf: Random variable is inf, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpKmzDnp/model-3ff8c2d74e9.stan', line 97, column 2 to column 41)
#> Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 4
#> Chain 1 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 2 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 4 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 1 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 3 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 2 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 1 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 4 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 2 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 3 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 1 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 4 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 2 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 3 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 4 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 3 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 1 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 2 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 1 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 4 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 2 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 1 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 3 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 4 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 2 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 1 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 4 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 2 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 3 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 4 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 3 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 3 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 2 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 4 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 4 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 3 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 1 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 4 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 2 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 3 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 4 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 3 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 3 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 4 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 1 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 2 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 4 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 3 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 4 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 3 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 1 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 4 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 2 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 3 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 1 finished in 26.8 seconds.
#> Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 2 finished in 27.0 seconds.
#> Chain 4 finished in 27.0 seconds.
#> Chain 3 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 3 finished in 27.6 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 27.1 seconds.
#> Total execution time: 27.8 seconds.

# model fit
augmented_not_vectorized$summary(variables = c("b_Intercept", "phi"))
#> # A tibble: 2 Γ— 10
#>   variable     mean median     sd    mad    q5   q95  rhat ess_bulk ess_tail
#>   <chr>       <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
#> 1 b_Intercept -2.21  -2.21 0.0745 0.0745 -2.33 -2.08  1.00    2114.    2767.
#> 2 phi          1.93   1.92 0.222  0.225   1.59  2.31  1.00     421.    1173.

drw_anv <- posterior::as_draws_rvars(augmented_not_vectorized)
anv_mu_phi <- drw_anv %>%
  as_draws_df() %>%
  select(Intercept, phi) %>%
  mutate(
    mu = plogis(Intercept),
    model = "data augmentation"
  )
#> Warning: Dropping 'draws_df' class as required metadata was removed.
anv_mu_phi %>%
  pivot_longer(cols = c("phi", "mu")) %>%
  ggplot() +
  geom_density(aes(x = value, colour = model)) +
  geom_vline(
    data = data.frame(
      name = c("mu", "phi"),
      value = c(mu, phi)
    ),
    aes(xintercept = value)) +
  facet_wrap(~name, scales = "free")


# ppc
predictions <- anv_mu_phi %>%
  mutate(
    left = pbeta(q = 0.05, mu * phi, (1 - mu) * phi),
    interval = pbeta(q = 0.25, mu * phi, (1 - mu) * phi) -
      pbeta(q = 0.05, mu * phi, (1 - mu) * phi),
    right = 1 - pbeta(q = 0.25, mu * phi, (1 - mu) * phi)
  ) %>%
  select(left, interval, right) %>%
  bayesplot::mcmc_intervals_data()

censored_data %>%
  mutate(censoring = factor(censoring, levels = c("left", "interval", "right"))) %>%
  ggplot() +
  geom_bar(
    aes(x = censoring,
        y = after_stat(count) / sum(after_stat(count))),
    alpha = 0.5
  ) +
  geom_linerange(
    data = predictions,
    aes(x = parameter, y = m, ymin = ll, ymax = hh)
  ) +
  geom_point(
    data = predictions,
    aes(x = parameter, y = m), size = 2
  ) +
  labs(y = "Proportion")

Created on 2025-07-18 with reprex v2.1.1

Session info

sessioninfo::session_info()
#> Warning in system2("quarto", "-V", stdout = TRUE, env = paste0("TMPDIR=", :
#> running command '"quarto"
#> TMPDIR=C:/Users/hans_vancalster/AppData/Local/Temp/RtmpSSWpwC/file2fbc60b455d9
#> -V' had status 1
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.5.0 (2025-04-11 ucrt)
#>  os       Windows 11 x64 (build 26100)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  Dutch_Belgium.utf8
#>  ctype    Dutch_Belgium.utf8
#>  tz       Europe/Brussels
#>  date     2025-07-18
#>  pandoc   3.4 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#>  quarto   NA @ C:\\PROGRA~1\\RStudio\\RESOUR~1\\app\\bin\\quarto\\bin\\quarto.exe
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  ! package        * version  date (UTC) lib source
#>    abind            1.4-8    2024-09-12 [1] CRAN (R 4.5.0)
#>    backports        1.5.0    2024-05-23 [1] CRAN (R 4.5.0)
#>    bayesplot        1.12.0   2025-04-10 [1] CRAN (R 4.5.0)
#>    bridgesampling   1.1-2    2021-04-16 [1] CRAN (R 4.5.0)
#>    brms           * 2.22.0   2024-09-23 [1] CRAN (R 4.5.0)
#>    Brobdingnag      1.2-9    2022-10-19 [1] CRAN (R 4.5.0)
#>    checkmate        2.3.2    2024-07-29 [1] CRAN (R 4.5.0)
#>    cli              3.6.5    2025-04-23 [1] CRAN (R 4.5.0)
#>    cmdstanr         0.9.0    2025-04-27 [1] https://stan-dev.r-universe.dev (R 4.5.0)
#>    coda             0.19-4.1 2024-01-31 [1] CRAN (R 4.5.0)
#>    curl             6.2.2    2025-03-24 [1] CRAN (R 4.5.0)
#>    data.table       1.17.2   2025-05-12 [1] CRAN (R 4.5.0)
#>    dichromat        2.0-0.1  2022-05-02 [1] CRAN (R 4.5.0)
#>    digest           0.6.37   2024-08-19 [1] CRAN (R 4.5.0)
#>    distributional   0.5.0    2024-09-17 [1] CRAN (R 4.5.0)
#>    dplyr          * 1.1.4    2023-11-17 [1] CRAN (R 4.5.0)
#>    emmeans          1.11.1   2025-05-04 [1] CRAN (R 4.5.0)
#>    estimability     1.5.1    2024-05-12 [1] CRAN (R 4.5.0)
#>    evaluate         1.0.3    2025-01-10 [1] CRAN (R 4.5.0)
#>    farver           2.1.2    2024-05-13 [1] CRAN (R 4.5.0)
#>    fastmap          1.2.0    2024-05-15 [1] CRAN (R 4.5.0)
#>    forcats        * 1.0.0    2023-01-29 [1] CRAN (R 4.5.0)
#>    fs               1.6.6    2025-04-12 [1] CRAN (R 4.5.0)
#>    generics         0.1.4    2025-05-09 [1] CRAN (R 4.5.0)
#>    ggplot2        * 3.5.2    2025-04-09 [1] CRAN (R 4.5.0)
#>    glue             1.8.0    2024-09-30 [1] CRAN (R 4.5.0)
#>    gtable           0.3.6    2024-10-25 [1] CRAN (R 4.5.0)
#>    hms              1.1.3    2023-03-21 [1] CRAN (R 4.5.0)
#>    htmltools        0.5.8.1  2024-04-04 [1] CRAN (R 4.5.0)
#>    jsonlite         2.0.0    2025-03-27 [1] CRAN (R 4.5.0)
#>    knitr            1.50     2025-03-16 [1] CRAN (R 4.5.0)
#>    labeling         0.4.3    2023-08-29 [1] CRAN (R 4.5.0)
#>    lattice          0.22-7   2025-04-02 [2] CRAN (R 4.5.0)
#>    lifecycle        1.0.4    2023-11-07 [1] CRAN (R 4.5.0)
#>    loo              2.8.0    2024-07-03 [1] CRAN (R 4.5.0)
#>    lubridate      * 1.9.4    2024-12-08 [1] CRAN (R 4.5.0)
#>    magrittr         2.0.3    2022-03-30 [1] CRAN (R 4.5.0)
#>    Matrix           1.7-3    2025-03-11 [2] CRAN (R 4.5.0)
#>    matrixStats      1.5.0    2025-01-07 [1] CRAN (R 4.5.0)
#>    mvtnorm          1.3-3    2025-01-10 [1] CRAN (R 4.5.0)
#>    nlme             3.1-168  2025-03-31 [2] CRAN (R 4.5.0)
#>    pillar           1.10.2   2025-04-05 [1] CRAN (R 4.5.0)
#>    pkgconfig        2.0.3    2019-09-22 [1] CRAN (R 4.5.0)
#>    plyr             1.8.9    2023-10-02 [1] CRAN (R 4.5.0)
#>    posterior        1.6.1    2025-02-27 [1] CRAN (R 4.5.0)
#>    processx         3.8.6    2025-02-21 [1] CRAN (R 4.5.0)
#>    ps               1.9.1    2025-04-12 [1] CRAN (R 4.5.0)
#>    purrr          * 1.0.4    2025-02-05 [1] CRAN (R 4.5.0)
#>    R6               2.6.1    2025-02-15 [1] CRAN (R 4.5.0)
#>    RColorBrewer     1.1-3    2022-04-03 [1] CRAN (R 4.5.0)
#>    Rcpp           * 1.0.14   2025-01-12 [1] CRAN (R 4.5.0)
#>  D RcppParallel     5.1.10   2025-01-24 [1] CRAN (R 4.5.0)
#>    readr          * 2.1.5    2024-01-10 [1] CRAN (R 4.5.0)
#>    reprex           2.1.1    2024-07-06 [1] CRAN (R 4.5.0)
#>    reshape2         1.4.4    2020-04-09 [1] CRAN (R 4.5.0)
#>    rlang            1.1.6    2025-04-11 [1] CRAN (R 4.5.0)
#>    rmarkdown        2.29     2024-11-04 [1] CRAN (R 4.5.0)
#>    rstantools       2.4.0    2024-01-31 [1] CRAN (R 4.5.0)
#>    rstudioapi       0.17.1   2024-10-22 [1] CRAN (R 4.5.0)
#>    scales           1.4.0    2025-04-24 [1] CRAN (R 4.5.0)
#>    sessioninfo      1.2.3    2025-02-05 [1] CRAN (R 4.5.0)
#>    stringi          1.8.7    2025-03-27 [1] CRAN (R 4.5.0)
#>    stringr        * 1.5.1    2023-11-14 [1] CRAN (R 4.5.0)
#>    tensorA          0.36.2.1 2023-12-13 [1] CRAN (R 4.5.0)
#>    tibble         * 3.2.1    2023-03-20 [1] CRAN (R 4.5.0)
#>    tidyr          * 1.3.1    2024-01-24 [1] CRAN (R 4.5.0)
#>    tidyselect       1.2.1    2024-03-11 [1] CRAN (R 4.5.0)
#>    tidyverse      * 2.0.0    2023-02-22 [1] CRAN (R 4.5.0)
#>    timechange       0.3.0    2024-01-18 [1] CRAN (R 4.5.0)
#>    tzdb             0.5.0    2025-03-15 [1] CRAN (R 4.5.0)
#>    utf8             1.2.5    2025-05-01 [1] CRAN (R 4.5.0)
#>    vctrs            0.6.5    2023-12-01 [1] CRAN (R 4.5.0)
#>    withr            3.0.2    2024-10-28 [1] CRAN (R 4.5.0)
#>    xfun             0.52     2025-04-02 [1] CRAN (R 4.5.0)
#>    xml2             1.3.8    2025-03-14 [1] CRAN (R 4.5.0)
#>    xtable           1.8-4    2019-04-21 [1] CRAN (R 4.5.0)
#>    yaml             2.3.10   2024-07-26 [1] CRAN (R 4.5.0)
#> 
#>  [1] C:/R/library
#>  [2] C:/R/R-4.5.0/library
#> 
#>  * ── Packages attached to the search path.
#>  D ── DLL MD5 mismatch, broken installation.
#> 
#> ──────────────────────────────────────────────────────────────────────────────

@hansvancalster thank you for taking the time to provide this example! I think it got me moving in the right direction, although I am still not sure what is going on.

To get the original question out of the way, I see that censoring model predictions and comparing with the censored data is a reasonable diagnostic to perform.

So, I wondered whether the model would still recover the coefficients in a more complex scenario so I added zero inflation and a predictor.

With data like this:

# 1. data generated from a zero-inflated beta distribution #####################
set.seed(123)

n = 1000
x = rnorm(n)

alpha = -.2
beta = 1.0
phi = 10.0
mu = plogis(alpha + beta*x)

alpha_zi = -4.5
beta_zi = -1.5
zi = plogis(alpha_zi + beta_zi*x)
is_zero = rbinom(n, 1, zi)

y = numeric(n)
for (i in 1:n) {
  if (is_zero[i] == 1) {
    y[i] = 0
  } else {
    y[i] <- rbeta(1, mu[i] * phi, (1 - mu[i]) * phi)
  }
}

d = data.frame(x = x, y = y)
plot(d$x, d$y)

after interval censoring the non-zero values,

dcens = d %>% 
  mutate(# the lower limits of intervals
    ylow = case_when(y  ==   0 ~ 0,
                     y   < .06 ~ .01,
                     y   < .26 ~ .06,
                     y  >= .26 ~ .26),
    # the upper limits of intervals
    yhigh = case_when(y  ==   0 ~ 0,
                      y   < .06 ~ .05,
                      y   < .26 ~ .25,
                      y  >= .26 ~ .99),
    # the type of intervals
    ycen = case_when(y  ==   0 ~ "none",
                     y   < .06 ~ "interval",
                     y   < .26 ~ "interval",
                     y  >= .26 ~ "interval")) 

the modified brms model recovers the coefficients just fine:

Regression Coefficients:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept       -0.21      0.05    -0.30    -0.11 1.00     2811     5860
zi_Intercept    -4.44      0.33    -5.14    -3.84 1.00    22385    11793
x                1.00      0.05     0.91     1.09 1.00     5234     9222
zi_x            -1.62      0.23    -2.09    -1.18 1.00    21650    11274
Further Distributional Parameters:
    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi    11.05      1.13     8.98    13.42 1.00     2329     5431

and does a decent job recovering the zeros and intervals:

However, the real data I tried this model with, are leaf damage measurements that have a lot of zeros (>50%) and the non-zero values correspond to mostly low percentages of damage. So I next tried something like this:

# 2. many zeros, lower proportions #############################################

set.seed(123)

n = 1000
x = rnorm(n)

alpha = -2.5
beta = .3
phi = 10
mu = plogis(alpha + beta*x)

alpha_zi = -2.5
beta_zi = -1.5
zi = plogis(alpha_zi + beta_zi*x)
is_zero = rbinom(n, 1, zi)

y = numeric(n)
for (i in 1:n) {
  if (is_zero[i] == 1) {
    y[i] = 0
  } else {
    y[i] <- rbeta(1, mu[i] * phi, (1 - mu[i]) * phi)
  }
}

d = data.frame(x = x, y = y)
plot(d$x, d$y)

And in this case, some coefficients were a bit off:

 Family: zero_inflated_beta 
  Links: mu = logit; phi = identity; zi = logit 
Formula: ylow | cens(ycen, yhigh) ~ x 
         zi ~ x
   Data: dcens (Number of observations: 1000) 
  Draws: 4 chains, each with iter = 6000; warmup = 2000; thin = 1;
         total post-warmup draws = 16000
Regression Coefficients:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept       -2.43      0.04    -2.50    -2.36 1.00     9891    10707
zi_Intercept    -2.61      0.15    -2.92    -2.33 1.00    12617    11036
x                0.20      0.03     0.14     0.26 1.00    15196    12678
zi_x            -1.45      0.14    -1.73    -1.20 1.00    12389    10962
Further Distributional Parameters:
    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi    16.61      1.05    14.62    18.75 1.00     8995    10668

and even more so the intervals:

I originally thought that zero inflation is complicating things but in fact the Bernoulli side of the model seems to be doing fine. It’s also not the case that there are not enough non-zero data to inform the Beta model… It is also inconsequential whether the >0.26% level is coded as interval- or right censored because we are dealing with a bounded distribution anyway.

cens_zi_beta_aug.stan (3.1 KB)
cens_zi_beta_test.R (10.9 KB)

See also better diagnostics than the bar plots in Recommendations for visual predictive checks in Bayesian workflow

Hi, I’ve found the problem. The censoring scheme you have used based on only interval censoring is an approximation that introduced the bias that you saw in your posterior predictive check, especially for the example with low mean for the beta part. When the beta part has a low mean, a lot of very low y-values will be generated. In your code, y-values < 0.06 were passed to the model as interval censored data [0.01 - 0.06]. The model will therefore interpret all values < 0.06 as between 0.01 and 0.06, but a lot of generated y-values were < 0.01. This causes bias. The same applies for data that should have been right censored instead of interval censored.

I updated your code to fix this (uses the same .stan file that you have used). PS: thanks for the trick to convert a cmdstanfit back to a brmsfit object. Very cool!

cens_zi_beta_test_2.R (4.4 KB)
cens_zi_beta_aug.stan (3.1 KB)

library(brms)
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.22.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#> 
#> Attaching package: 'brms'
#> The following object is masked from 'package:stats':
#> 
#>     ar
library(tidyverse)
library(cmdstanr)
#> This is cmdstanr version 0.9.0
#> - CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
#> - CmdStan path: C:/Users/hans_vancalster/.cmdstan/cmdstan-2.36.0
#> - CmdStan version: 2.36.0

# custom ppc function ##########################################################
ppc = function(fit, data, ndraws = 10) {
  levels = c("0", 
             "<= 0.05", 
             "0.05–0.25", 
             "> 0.25")
  
  observed = case_when(data$y  == 0  ~ levels[1],
                       data$y  <= .05 ~ levels[2],
                       data$y  <= .25 ~ levels[3],
                       data$y > .25 ~ levels[4])
  
  ypred = posterior_predict(fit, ndraws = ndraws)
  
  bins = function(x) {
    case_when(x  == 0   ~ levels[1],
              x  <= 0.05 ~ levels[2],
              x  <= 0.25 ~ levels[3],
              x > 0.25 ~ levels[4])
  }
  
  sim_bins = apply(ypred, 1, function(x) table(factor(bins(x),
                                                      levels = levels)))
  sim_df = as.data.frame(t(sim_bins))
  sim_df$draw = 1:nrow(sim_df)
  
  obs_counts = table(factor(observed, 
                            levels = levels))
  
  obs_df = data.frame(Category = names(obs_counts),
                      Observed = as.numeric(obs_counts))
  
  sim_summary = sim_df %>%
    pivot_longer(cols = -draw, 
                 names_to = "Category", 
                 values_to = "Simulated") %>%
    group_by(Category) %>%
    summarise(mean = mean(Simulated),
              lower = quantile(Simulated, 0.05),
              upper = quantile(Simulated, 0.95))
  
  plot_df = left_join(obs_df, sim_summary, by = "Category") %>%
    mutate(Category = factor(Category, levels = levels))
  
  ggplot(plot_df, aes(x = Category)) +
    geom_col(aes(y = Observed), fill = "#d0e1eb", alpha = 0.7) +
    geom_pointrange(aes(y = mean, ymin = lower, ymax = upper), 
                    color = "#053a6c", size = .25, fatten = 2) +
    labs(y = "Count", x = "Damage category",
         subtitle = "bars = y, pointrange = yrep mean & 90% CI") +
    theme_minimal()
}

# 2. many zeros, lower proportions #############################################

set.seed(123)

n = 1000
x = rnorm(n)

alpha = -2.5
beta = .3
phi = 10
mu = plogis(alpha + beta*x)

alpha_zi = -2.5
beta_zi = -1.5
zi = plogis(alpha_zi + beta_zi*x)
is_zero = rbinom(n, 1, zi)

y = numeric(n)
for (i in 1:n) {
  if (is_zero[i] == 1) {
    y[i] = 0
  } else {
    y[i] <- rbeta(1, mu[i] * phi, (1 - mu[i]) * phi)
  }
}

d = data.frame(x = x, y = y)
plot(d$x, d$y)


# with censoring ###########################################################

dcens = d %>% 
  mutate(# the lower limits of intervals
    ylow = case_when(y  ==   0 ~ 0,
                     y  <= .05 ~ 1e-7,
                     y  <= .25 ~ .05,
                     y  > .25 ~ .25),
    # the upper limits of intervals
    yhigh = case_when(y  ==   0 ~ 0,
                      y <= .05 ~ .05,
                      y <= .25 ~ .25,
                      y  > .25 ~ 1 - 1e-7),
    # the type of intervals
    censoring = case_when(y  ==   0 ~ "none",
                          y <= .05 ~ "left",
                          y  <= .25 ~ "interval",
                          y  > .25 ~ "right"),
    ycens = case_when(
      censoring == "none" ~ y,
      censoring == "left" ~ yhigh,
      censoring == "right" ~ ylow,
      TRUE ~ ylow
    )) 



# modified brms model

formula = bf(
  ycens | cens(censoring, yhigh) ~ x, 
  zi ~ x
)
sdata = brms::make_standata(
  formula,
  family = zero_inflated_beta(), 
  data = dcens)

model <- cmdstan_model("cens_zi_beta_aug.stan", quiet = TRUE) |>
  suppressMessages()

stanfit <- model$sample(
  data = sdata, 
  chains = 4,  
  seed = 123, 
  parallel_chains = 4
)
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 1 Exception: gamma_lpdf: Random variable is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 82, column 2 to column 41)
#> Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 1
#> Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 1 Exception: gamma_lpdf: Random variable is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 82, column 2 to column 41)
#> Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 1
#> Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 1 Exception: Exception: beta_lpdf: First shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 8, column 6 to line 9, column 47) (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 100, column 6 to column 71)
#> Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 1
#> Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 1 Exception: gamma_lpdf: Random variable is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 82, column 2 to column 41)
#> Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 1
#> Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 1 Exception: gamma_lpdf: Random variable is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 82, column 2 to column 41)
#> Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 1
#> Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 2 Exception: gamma_lpdf: Random variable is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 82, column 2 to column 41)
#> Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 2
#> Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 2 Exception: gamma_lpdf: Random variable is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 82, column 2 to column 41)
#> Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 2
#> Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 2 Exception: Exception: beta_lpdf: First shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 8, column 6 to line 9, column 47) (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 100, column 6 to column 71)
#> Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 2
#> Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 2 Exception: gamma_lpdf: Random variable is inf, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 82, column 2 to column 41)
#> Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 2
#> Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 2 Exception: Exception: beta_lpdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 8, column 6 to line 9, column 47) (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 100, column 6 to column 71)
#> Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 2
#> Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 3 Exception: Exception: beta_lpdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 8, column 6 to line 9, column 47) (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 100, column 6 to column 71)
#> Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 3
#> Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 3 Exception: Exception: beta_lpdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 8, column 6 to line 9, column 47) (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 100, column 6 to column 71)
#> Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 3
#> Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 3 Exception: Exception: beta_lpdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 8, column 6 to line 9, column 47) (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 100, column 6 to column 71)
#> Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 3
#> Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 3 Exception: Exception: beta_lpdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 8, column 6 to line 9, column 47) (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 100, column 6 to column 71)
#> Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 3
#> Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 3 Exception: gamma_lpdf: Random variable is inf, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 82, column 2 to column 41)
#> Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 3
#> Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 4 Exception: Exception: beta_lpdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 8, column 6 to line 9, column 47) (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 100, column 6 to column 71)
#> Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 4
#> Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 4 Exception: Exception: beta_lpdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 8, column 6 to line 9, column 47) (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 100, column 6 to column 71)
#> Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 4
#> Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 4 Exception: Exception: beta_lpdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 8, column 6 to line 9, column 47) (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 100, column 6 to column 71)
#> Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 4
#> Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 4 Exception: gamma_lpdf: Random variable is inf, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 82, column 2 to column 41)
#> Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 4
#> Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 4 Exception: Exception: beta_lpdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 8, column 6 to line 9, column 47) (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpOQA2hk/model-347c2f54829.stan', line 100, column 6 to column 71)
#> Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 4
#> Chain 2 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 3 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 1 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 2 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 4 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 1 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 2 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 3 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 4 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 1 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 2 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 3 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 1 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 4 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 3 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 2 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 4 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 2 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 1 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 3 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 1 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 2 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 3 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 4 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 2 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 1 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 3 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 4 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 1 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 3 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 4 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 4 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 3 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 1 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 3 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 4 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 3 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 3 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 4 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 2 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 2 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 3 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 4 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 1 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 3 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 4 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 3 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 2 finished in 111.7 seconds.
#> Chain 3 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 1 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 3 finished in 113.7 seconds.
#> Chain 4 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 4 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 1 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 4 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 4 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 1 finished in 122.0 seconds.
#> Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 4 finished in 123.3 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 117.7 seconds.
#> Total execution time: 123.5 seconds.

m3b = brm(formula, 
          family = zero_inflated_beta(), 
          data = dcens, empty = TRUE)
m3b$fit = read_csv_as_stanfit(stanfit$output_files(), model = model)
#> Loading required namespace: rstan
m3b = rename_pars(m3b)
saveRDS(m3b, file = "m3b.rds")
m3b = readRDS("m3b.rds")
summary(m3b)
#>  Family: zero_inflated_beta 
#>   Links: mu = logit; phi = identity; zi = logit 
#> Formula: ycens | cens(censoring, yhigh) ~ x 
#>          zi ~ x
#>    Data: dcens (Number of observations: 1000) 
#>   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 4000
#> 
#> Regression Coefficients:
#>              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept       -2.46      0.04    -2.54    -2.37 1.00     3060     3247
#> zi_Intercept    -2.61      0.15    -2.93    -2.33 1.00     3522     2618
#> x                0.32      0.04     0.24     0.40 1.00     1876     2986
#> zi_x            -1.45      0.14    -1.73    -1.18 1.00     3435     2943
#> 
#> Further Distributional Parameters:
#>     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> phi     9.74      0.81     8.24    11.39 1.00     1245     1962
#> 
#> Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).

ppc(m3b, data = dcens, ndraws = 100)

Created on 2025-07-24 with reprex v2.1.1

Session info

sessioninfo::session_info()
#> Warning in system2("quarto", "-V", stdout = TRUE, env = paste0("TMPDIR=", :
#> running command '"quarto"
#> TMPDIR=C:/Users/hans_vancalster/AppData/Local/Temp/RtmpCQncg7/file3bac25c04088
#> -V' had status 1
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.5.0 (2025-04-11 ucrt)
#>  os       Windows 11 x64 (build 26100)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  Dutch_Belgium.utf8
#>  ctype    Dutch_Belgium.utf8
#>  tz       Europe/Brussels
#>  date     2025-07-24
#>  pandoc   3.4 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#>  quarto   NA @ C:\\PROGRA~1\\RStudio\\RESOUR~1\\app\\bin\\quarto\\bin\\quarto.exe
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  ! package        * version  date (UTC) lib source
#>    abind            1.4-8    2024-09-12 [1] CRAN (R 4.5.0)
#>    backports        1.5.0    2024-05-23 [1] CRAN (R 4.5.0)
#>    bayesplot        1.12.0   2025-04-10 [1] CRAN (R 4.5.0)
#>    bridgesampling   1.1-2    2021-04-16 [1] CRAN (R 4.5.0)
#>    brms           * 2.22.0   2024-09-23 [1] CRAN (R 4.5.0)
#>    Brobdingnag      1.2-9    2022-10-19 [1] CRAN (R 4.5.0)
#>    checkmate        2.3.2    2024-07-29 [1] CRAN (R 4.5.0)
#>    cli              3.6.5    2025-04-23 [1] CRAN (R 4.5.0)
#>    cmdstanr       * 0.9.0    2025-04-27 [1] https://stan-dev.r-universe.dev (R 4.5.0)
#>    coda             0.19-4.1 2024-01-31 [1] CRAN (R 4.5.0)
#>    codetools        0.2-20   2024-03-31 [2] CRAN (R 4.5.0)
#>    curl             6.2.2    2025-03-24 [1] CRAN (R 4.5.0)
#>    data.table       1.17.2   2025-05-12 [1] CRAN (R 4.5.0)
#>    dichromat        2.0-0.1  2022-05-02 [1] CRAN (R 4.5.0)
#>    digest           0.6.37   2024-08-19 [1] CRAN (R 4.5.0)
#>    distributional   0.5.0    2024-09-17 [1] CRAN (R 4.5.0)
#>    dplyr          * 1.1.4    2023-11-17 [1] CRAN (R 4.5.0)
#>    emmeans          1.11.1   2025-05-04 [1] CRAN (R 4.5.0)
#>    estimability     1.5.1    2024-05-12 [1] CRAN (R 4.5.0)
#>    evaluate         1.0.3    2025-01-10 [1] CRAN (R 4.5.0)
#>    farver           2.1.2    2024-05-13 [1] CRAN (R 4.5.0)
#>    fastmap          1.2.0    2024-05-15 [1] CRAN (R 4.5.0)
#>    forcats        * 1.0.0    2023-01-29 [1] CRAN (R 4.5.0)
#>    fs               1.6.6    2025-04-12 [1] CRAN (R 4.5.0)
#>    generics         0.1.4    2025-05-09 [1] CRAN (R 4.5.0)
#>    ggplot2        * 3.5.2    2025-04-09 [1] CRAN (R 4.5.0)
#>    glue             1.8.0    2024-09-30 [1] CRAN (R 4.5.0)
#>    gridExtra        2.3      2017-09-09 [1] CRAN (R 4.5.0)
#>    gtable           0.3.6    2024-10-25 [1] CRAN (R 4.5.0)
#>    hms              1.1.3    2023-03-21 [1] CRAN (R 4.5.0)
#>    htmltools        0.5.8.1  2024-04-04 [1] CRAN (R 4.5.0)
#>    inline           0.3.21   2025-01-09 [1] CRAN (R 4.5.0)
#>    jsonlite         2.0.0    2025-03-27 [1] CRAN (R 4.5.0)
#>    knitr            1.50     2025-03-16 [1] CRAN (R 4.5.0)
#>    labeling         0.4.3    2023-08-29 [1] CRAN (R 4.5.0)
#>    lattice          0.22-7   2025-04-02 [2] CRAN (R 4.5.0)
#>    lifecycle        1.0.4    2023-11-07 [1] CRAN (R 4.5.0)
#>    loo              2.8.0    2024-07-03 [1] CRAN (R 4.5.0)
#>    lubridate      * 1.9.4    2024-12-08 [1] CRAN (R 4.5.0)
#>    magrittr         2.0.3    2022-03-30 [1] CRAN (R 4.5.0)
#>    Matrix           1.7-3    2025-03-11 [2] CRAN (R 4.5.0)
#>    matrixStats      1.5.0    2025-01-07 [1] CRAN (R 4.5.0)
#>    mvtnorm          1.3-3    2025-01-10 [1] CRAN (R 4.5.0)
#>    nlme             3.1-168  2025-03-31 [2] CRAN (R 4.5.0)
#>    pillar           1.10.2   2025-04-05 [1] CRAN (R 4.5.0)
#>    pkgbuild         1.4.7    2025-03-24 [1] CRAN (R 4.5.0)
#>    pkgconfig        2.0.3    2019-09-22 [1] CRAN (R 4.5.0)
#>    plyr             1.8.9    2023-10-02 [1] CRAN (R 4.5.0)
#>    posterior        1.6.1    2025-02-27 [1] CRAN (R 4.5.0)
#>    processx         3.8.6    2025-02-21 [1] CRAN (R 4.5.0)
#>    ps               1.9.1    2025-04-12 [1] CRAN (R 4.5.0)
#>    purrr          * 1.0.4    2025-02-05 [1] CRAN (R 4.5.0)
#>    QuickJSR         1.7.0    2025-03-31 [1] CRAN (R 4.5.0)
#>    R6               2.6.1    2025-02-15 [1] CRAN (R 4.5.0)
#>    RColorBrewer     1.1-3    2022-04-03 [1] CRAN (R 4.5.0)
#>    Rcpp           * 1.0.14   2025-01-12 [1] CRAN (R 4.5.0)
#>  D RcppParallel     5.1.10   2025-01-24 [1] CRAN (R 4.5.0)
#>    readr          * 2.1.5    2024-01-10 [1] CRAN (R 4.5.0)
#>    reprex           2.1.1    2024-07-06 [1] CRAN (R 4.5.0)
#>    reshape2         1.4.4    2020-04-09 [1] CRAN (R 4.5.0)
#>    rlang            1.1.6    2025-04-11 [1] CRAN (R 4.5.0)
#>    rmarkdown        2.29     2024-11-04 [1] CRAN (R 4.5.0)
#>    rstan            2.32.7   2025-03-10 [1] CRAN (R 4.5.0)
#>    rstantools       2.4.0    2024-01-31 [1] CRAN (R 4.5.0)
#>    rstudioapi       0.17.1   2024-10-22 [1] CRAN (R 4.5.0)
#>    scales           1.4.0    2025-04-24 [1] CRAN (R 4.5.0)
#>    sessioninfo      1.2.3    2025-02-05 [1] CRAN (R 4.5.0)
#>    StanHeaders      2.32.10  2024-07-15 [1] CRAN (R 4.5.0)
#>    stringi          1.8.7    2025-03-27 [1] CRAN (R 4.5.0)
#>    stringr        * 1.5.1    2023-11-14 [1] CRAN (R 4.5.0)
#>    tensorA          0.36.2.1 2023-12-13 [1] CRAN (R 4.5.0)
#>    tibble         * 3.2.1    2023-03-20 [1] CRAN (R 4.5.0)
#>    tidyr          * 1.3.1    2024-01-24 [1] CRAN (R 4.5.0)
#>    tidyselect       1.2.1    2024-03-11 [1] CRAN (R 4.5.0)
#>    tidyverse      * 2.0.0    2023-02-22 [1] CRAN (R 4.5.0)
#>    timechange       0.3.0    2024-01-18 [1] CRAN (R 4.5.0)
#>    tzdb             0.5.0    2025-03-15 [1] CRAN (R 4.5.0)
#>    vctrs            0.6.5    2023-12-01 [1] CRAN (R 4.5.0)
#>    withr            3.0.2    2024-10-28 [1] CRAN (R 4.5.0)
#>    xfun             0.52     2025-04-02 [1] CRAN (R 4.5.0)
#>    xml2             1.3.8    2025-03-14 [1] CRAN (R 4.5.0)
#>    xtable           1.8-4    2019-04-21 [1] CRAN (R 4.5.0)
#>    yaml             2.3.10   2024-07-26 [1] CRAN (R 4.5.0)
#> 
#>  [1] C:/R/library
#>  [2] C:/R/R-4.5.0/library
#> 
#>  * ── Packages attached to the search path.
#>  D ── DLL MD5 mismatch, broken installation.
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Thank you once again, this is really helpful!

Spelling things out to see if I have everything right:

In ycens | cens(censoring, yhigh) the value that goes into ycens depends on the type of censoring;

  • for left censoring, it’s an upper bound e.g. 0.05 for observations in (0.0.5],
  • for right censoring, it’s a lower bound e.g. 0.25 for observations in (0.25, 1),
  • for interval censoring it’s the lower bound of the interval with yhigh as the upper bound

So in fact, either of two things that you did here, solves the problem in isolation:

This works:

  dcens = d %>% 
    mutate(# the lower limits of intervals
      ylow = case_when(y  ==   0 ~ 0,
                       y  <= .05 ~ 1e-7,
                       y  <= .25 ~ .05,
                       y  > .25 ~ .25),
      # the upper limits of intervals
      yhigh = case_when(y  ==   0 ~ 0,
                        y <= .05 ~ .05,
                        y <= .25 ~ .25,
                        y  > .25 ~ 1 - 1e-7),
      # the type of intervals
      censoring = case_when(y  ==   0 ~ "none",
                            y <= .05 ~ "interval",
                            y  <= .25 ~ "interval",
                            y  > .25 ~ "interval"),
      ycens = case_when(
        censoring == "none" ~ y,
        #censoring == "left" ~ yhigh,
        #censoring == "right" ~ ylow,
        TRUE ~ ylow
      ))

because now the lower (upper) bound of the outer intervals are small (high) enough not to cause trouble.

This works too (and is probably the more principled way to handle things):

dcens = d %>% 
    mutate(# the lower limits of intervals
      ylow = case_when(y  ==   0 ~ 0,
                       y  <= .05 ~ [any value],
                       y  <= .25 ~ .05,
                       y  > .25 ~ .25),
      # the upper limits of intervals
      yhigh = case_when(y  ==   0 ~ 0,
                        y <= .05 ~ .05,
                        y <= .25 ~ .25,
                        y  > .25 ~ [any value]),
      # the type of intervals
      censoring = case_when(y  ==   0 ~ "none",
                            y <= .05 ~ "left",
                            y  <= .25 ~ "interval",
                            y  > .25 ~ "right"),
      ycens = case_when(
        censoring == "none" ~ y,
        censoring == "left" ~ yhigh,
        censoring == "right" ~ ylow,
        TRUE ~ ylow
      )) 

and here, the [any value] bits are now irrelevant.

Blockquote In ycens | cens(censoring, yhigh) the value that goes into ycens depends on the type of censoring;

  • for left censoring, it’s an upper bound e.g. 0.05 for observations in (0.0.5],
  • for right censoring, it’s a lower bound e.g. 0.25 for observations in (0.25, 1),
  • for interval censoring it’s the lower bound of the interval with yhigh as the upper bound

Correct. See Additional Response Information β€” addition-terms β€’ brms explanation of parameters x and y2, where this is more or less documented.

And you should go with the more principled way, i.e. if the data are left or right censored, indicate them as such instead of treating them as interval censored and relying on a value close to the theoretical lower (for left censored) or upper bound (for right censored).
In fact, where I used β€˜placeholder’ values 1e-7 or 1 - 1e-7, it doesn’t matter what value is filled in (but not NA otherwise the row will be deleted) because it will not be used in the stan model (left censored only uses the upper bound and right censored only uses the lower bound). The brms syntax, while generally being very clear, is a bit convoluted when you have different kinds of censoring going on in your data.