Hello.
I am trying to fit a heteroscedastic ANOVA model using brms. The responses are interval censored, meaning that I only know the upper and lower bound of the values.
The code to simulate the dataset is below. I simulate 5 groups with different means and variances. The real data I’m working with are dilutions, so we usually work with log10. In this simulated dataset, I10-exponentiated the response. Then, I censor the simulated responses using the return_upper_bound() and return_lower_bound() functions defined in the code.
library(brms)
library(ggplot2)
library(dplyr)
library(tidyr)
library(tidybayes)
bounds_sim = c(5, 10, 20, 30, 40, 60, 80, 120, 160, 240, 320, 480, 640, 960, 1280, 1920, 2560, 3840, 5120, 7680, 10240, 15360, 20480, 30720, 40960, 61440)
return_interval = function(x, bounds){
# Returns a vector with the lower and upper bound if the interval corresponding to x.
if(is.unsorted(bounds, strictly = T)) stop("bounds vector should be ordered increasingly.")
if(x < bounds[1]) stop("x is out of bounds. x should be greater than or equal to ", bounds[1], ".")
if(x >= bounds[length(bounds)]) stop("x is out of bounds. x should be less than ", bounds[length(bounds)], ".")
lower_ix = Position(f = function(y) return(y <= x), x = bounds, right = T)
upper_ix = Position(f = function(y) return(y > x), x = bounds, right = F)
return(bounds[lower_ix:upper_ix])
}
return_lower_bound = function(x_vec, bounds){
if(is.unsorted(bounds, strictly = T)) stop("bounds vector should be ordered increasingly.")
n_x = length(x_vec)
out = rep(NA_real_, n_x)
for(i in 1:n_x){
out[i] = return_interval(x_vec[i], bounds)[1]
}
return(out)
}
return_upper_bound = function(x_vec, bounds){
if(is.unsorted(bounds, strictly = T)) stop("bounds vector should be ordered increasingly.")
n_x = length(x_vec)
out = rep(NA_real_, n_x)
for(i in 1:n_x){
out[i] = return_interval(x_vec[i], bounds)[2]
}
return(out)
}
n_subjects_per_group = 50
mus_1 = c(1.9, 2.6, 2.7, 2.6, 2.7)
log_sigmas_1 = c(-1.5, -2.5, -1.00, -2, -1.7)
set.seed(2025)
simulated_data_01 = lapply(1:length(mus_1), function(i){
Y_i = rnorm(n_subjects_per_group, mean = mus_1[i], sd = exp(log_sigmas_1[i]))
out_i = data.frame(
group_id = as.character(i),
Y = Y_i,
response = 10**Y_i
)
return(out_i)
}) %>%
bind_rows() %>%
mutate(
censored = 2,
lower_value = return_lower_bound(response, bounds_sim),
upper_value = return_upper_bound(response, bounds_sim)
)
The first ten rows of the simulated dataset are:
group_id Y response censored lower_value upper_value
1 1 2.038510 109.27217 2 80 120
2 1 1.907953 80.90077 2 80 120
3 1 2.072514 118.17186 2 80 120
4 1 2.183931 152.73223 2 120 160
5 1 1.982776 96.11160 2 80 120
6 1 1.863662 73.05708 2 60 80
7 1 1.988608 97.41092 2 80 120
8 1 1.882152 76.23457 2 60 80
9 1 1.823028 66.53158 2 60 80
10 1 2.056671 113.93867 2 80 120
The simulated data look like this:
simulated_data_01 %>%
ggplot() +
geom_point(aes(x = group_id, y = response),
position = position_jitter(width = 0.2, height = 0, seed = 1)) +
scale_y_log10() +
theme_bw()
And the lower values of the censoring look like this:
simulated_data_01 %>%
ggplot() +
geom_point(aes(x = group_id, y = lower_value),
position = position_jitter(width = 0.2, height = 0, seed = 1)) +
scale_y_log10() +
theme_bw()
I did not add the upper values because it does not add much to the information relevant to the problem I encounter.
When I fit an interval censored model with brms and no bounds on any parameters, everything runs smoothly:
prior_mod_simdata_1 = c(
# Priors for sigma
prior(normal(-1.5, 1), dpar = "sigma"),
# Priors for coefficients
prior(normal(2.5, 3), class = "b")
)
mod_simdata_1 = brm(
formula = bf(
log10(lower_value) | cens(censored, log10(upper_value)) ~ 0 + group_id,
sigma ~ 0 + group_id
),
data = simulated_data_01,
iter = 15000,
warmup = 5000,
# iter = 5000,
# warmup = 2500,
prior = prior_mod_simdata_1,
chains = 6,
cores = 6,
seed = 2024,
backend = "cmdstanr"
)
However, for reasons that don’t make sense in this dataset, if I add a lower bound of -5 to the variances, then the sampling cannot start:
prior_mod_simdata_2 = c(
# Priors for sigma
prior(normal(-1.5, 1), dpar = "sigma", lb = -5),
# Priors for coefficients
prior(normal(2.5, 3), class = "b")
)
# Runs okay
mod_simdata_2_prior_only = brm(
formula = bf(log10(lower_value) | cens(censored, log10(upper_value)) ~ 0 + group_id,
sigma ~ 0 + group_id),
data = simulated_data_01,
iter = 10000,
warmup = 3000,
# iter = 5000,
# warmup = 2500,
prior = prior_mod_simdata_2,
sample_prior = "only",
chains = 6,
cores = 6,
seed = 2024,
backend = "cmdstanr"
)
make_stancode(
formula = bf(log10(lower_value) | cens(censored, log10(upper_value)) ~ 0 + group_id,
sigma ~ 0 + group_id),
data = simulated_data_01,
prior = prior_mod_simdata_2)
# Fails with this error:
# Rejecting initial value:
# Log probability evaluates to log(0), i.e. negative infinity.
# Stan can't start sampling from this initial value.
mod_simdata_2 = brm(
formula = bf(log10(lower_value) | cens(censored, log10(upper_value)) ~ 0 + group_id,
sigma ~ 0 + group_id),
data = simulated_data_01,
iter = 15000,
warmup = 5000,
# iter = 5000,
# warmup = 2500,
prior = prior_mod_simdata_2,
chains = 6,
cores = 6,
seed = 2024,
backend = "cmdstanr",
control = list(adapt_delta = 0.9)
)
I get errors in each chain saying:
Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can't start sampling from this initial value.
These errors appear only when trying to fit the model, and not when I sample from the prior.
However, if I remove the censoring and fit the model with the same lower bound on sigma, then it runs fine:
prior_mod_simdata_3 = c(
# Priors for sigma
prior(normal(-1.5, 1), dpar = "sigma", lb = -5),
# Priors for coefficients
prior(normal(2.5, 3), class = "b")
)
mod_simdata_3 = brm(
formula = bf(log10(lower_value) ~ 0 + group_id,
sigma ~ 0 + group_id),
data = simulated_data_01,
iter = 15000,
warmup = 5000,
# iter = 5000,
# warmup = 2500,
prior = prior_mod_simdata_3,
chains = 6,
cores = 6,
seed = 2024,
backend = "cmdstanr",
control = list(adapt_delta = 0.9)
)
I tried using my own inits, but still it wouldn’t run. And I cannot see why the constrains would be violated with the initial values for the parameters. The lower bound that I’m setting on sigma is not particularly strict.
Does anyone have any idea of why this happens and how I could circumvent it?
I’m using Windows 11, and these versions of R and R packages:
> sessionInfo()
R version 4.4.3 (2025-02-28 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C LC_TIME=English_United States.utf8
time zone: Europe/Berlin
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rstan_2.32.7 StanHeaders_2.32.10 tidybayes_3.0.7 tidyr_1.3.1 dplyr_1.1.4 ggplot2_3.5.2
[7] brms_2.22.0 Rcpp_1.0.14
loaded via a namespace (and not attached):
[1] gtable_0.3.6 tensorA_0.36.2.1 QuickJSR_1.7.0 xfun_0.51 processx_3.8.6
[6] inline_0.3.21 lattice_0.22-6 vctrs_0.6.5 tools_4.4.3 ps_1.9.1
[11] generics_0.1.3 stats4_4.4.3 parallel_4.4.3 sandwich_3.1-1 tibble_3.2.1
[16] cmdstanr_0.9.0 pkgconfig_2.0.3 Matrix_1.7-2 data.table_1.17.0 checkmate_2.3.2
[21] RColorBrewer_1.1-3 distributional_0.5.0 RcppParallel_5.1.10 lifecycle_1.0.4 compiler_4.4.3
[26] farver_2.1.2 stringr_1.5.1 Brobdingnag_1.2-9 codetools_0.2-20 bayesplot_1.11.1
[31] pillar_1.10.2 arrayhelpers_1.1-0 MASS_7.3-64 bridgesampling_1.1-2 abind_1.4-8
[36] multcomp_1.4-28 nlme_3.1-167 posterior_1.6.1 tidyselect_1.2.1 svUnit_1.0.6
[41] mvtnorm_1.3-3 stringi_1.8.4 reshape2_1.4.4 purrr_1.0.4 labeling_0.4.3
[46] splines_4.4.3 grid_4.4.3 cli_3.6.4 magrittr_2.0.3 loo_2.8.0.9000
[51] pkgbuild_1.4.7 survival_3.8-3 TH.data_1.1-3 withr_3.0.2 scales_1.4.0
[56] backports_1.5.0 estimability_1.5.1 emmeans_1.11.0 matrixStats_1.5.0 gridExtra_2.3
[61] zoo_1.8-14 coda_0.19-4.1 evaluate_1.0.3 knitr_1.50 ggdist_3.3.3
[66] rstantools_2.4.0 rlang_1.1.5 xtable_1.8-4 glue_1.8.0 rstudioapi_0.17.1
[71] jsonlite_1.9.1 plyr_1.8.9 R6_2.6.1

