My intuition is that if I have censored observations with intervals, I should be getting less information from them than if I had the exact point i.e. the posterior should be closer to the prior.
However, when I run an interval-censored regression in brms, I see that residual variances are consistently underestimated and have values that are actually farther away from the prior mean, which is the exact opposite of what I would have expected.
Example:
library(brms)
N = 100
x = rnorm(N)
y = rnorm(N, 0.5*x)
yy = y
y_se = 0.7
y1 = y - 1.5
y2 = y + 1.5
y_cens = "interval"
d = data.frame(x, y, yy, y_se, y1, y2, y_cens)
fit_pr = brm(
bf(y ~ 1), data = d,
sample_prior = "only", cores = 4
)
# Family Specific Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma 2.69 2.73 0.09 9.81 1.00 1946 1275
fit = brm(
bf(y ~ x) + bf(y1 | cens(y_cens, y2) ~ x) +
bf(yy | mi(y_se) ~ x) + set_rescor(FALSE),
data = d, cores = 4
)
# Family Specific Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma_y 1.06 0.08 0.92 1.22 1.00 5451 2483
# sigma_y1 0.68 0.09 0.52 0.88 1.00 7473 3009
# sigma_yy 0.78 0.10 0.59 1.00 1.00 1293 1411
So we see that the mean residual standard deviations for observations with censoring and measurement error are 20-30% smaller than the error-free estimate and about a quarter of the prior mean.
I suppose that mathematically it makes sense that smaller variances would maximize the probability of my observations being in the given intervals, but am I wrong in thinking this is not what one would usually want to get as a result? What would be the right way of achieving the behavior I’ve described instead?