Sanity check: behavior of residual variance posterior with interval-censored observations

bmfazio · February 10, 2024, 7:53pm

My intuition is that if I have censored observations with intervals, I should be getting less information from them than if I had the exact point i.e. the posterior should be closer to the prior.

However, when I run an interval-censored regression in brms, I see that residual variances are consistently underestimated and have values that are actually farther away from the prior mean, which is the exact opposite of what I would have expected.

Example:

library(brms)
N = 100
x = rnorm(N)
y = rnorm(N, 0.5*x)
yy = y
y_se = 0.7
y1 = y - 1.5
y2 = y + 1.5
y_cens = "interval"
d = data.frame(x, y, yy, y_se, y1, y2, y_cens)
fit_pr = brm(
  bf(y ~ 1), data = d,
  sample_prior = "only", cores = 4
)
# Family Specific Parameters: 
#       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma     2.69      2.73     0.09     9.81 1.00     1946     1275

fit = brm(
  bf(y ~ x) + bf(y1 | cens(y_cens, y2) ~ x) +
  bf(yy | mi(y_se) ~ x) + set_rescor(FALSE),
  data = d, cores = 4
)
# Family Specific Parameters: 
#          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma_y      1.06      0.08     0.92     1.22 1.00     5451     2483
# sigma_y1     0.68      0.09     0.52     0.88 1.00     7473     3009
# sigma_yy     0.78      0.10     0.59     1.00 1.00     1293     1411

So we see that the mean residual standard deviations for observations with censoring and measurement error are 20-30% smaller than the error-free estimate and about a quarter of the prior mean.

I suppose that mathematically it makes sense that smaller variances would maximize the probability of my observations being in the given intervals, but am I wrong in thinking this is not what one would usually want to get as a result? What would be the right way of achieving the behavior I’ve described instead?

jsocolar · February 11, 2024, 2:28am

I think one way to frame the intuition here is that in your interval-censored and measurement-error cases, you are allowing the regression itself to dictate where within their intervals the points are likely to be, rather than forcing the regression to deal with where the points actually are.

One way would be to implement a “cut”, i.e. don’t let the data from the regression flow back to inform the imputation. In Stan, this is most straightforwardly done via many repeated model fits over multiply imputed data.

bmfazio · February 11, 2024, 12:39pm

Thank you! I suppose the regression proper is a bit of a red herring here, I initially put it to check what would happen with the slopes. But you’re right, this is basically “asking” the distribution what parameter values would maximize the probability of it producing observations that fall within the intervals.

Made a visualization that fully made the intuition click, leaving it here for posterity’s sake:

library(dplyr)
library(tidyr)
library(ggplot2)
intp = function(l,u,s) (pnorm(u, sd = s) - pnorm(l, sd = s))
N = 100
y = rnorm(N)
y1 = y - 2.5
y2 = y + 2.5
data.frame(y1, y2) %>%
  mutate(sigma = list(seq(0.01, 1.5, 0.01))) %>%
  unnest(sigma) %>%
  mutate(prob = log(intp(y1, y2, sigma))) %>%
  group_by(sigma) %>%
  summarise(jointprob = exp(sum(prob))) %>%
  ggplot(aes(x = sigma, y = jointprob)) +
  geom_point()

(it’s particularly revealing to notice how sigma = 0 will be the “winner” as long as all intervals happen to be crossing the mean)

I suppose that a more adequate model for the situation I had in mind would take the observations to be the midpoint of the interval and then model them as being drawn from a uniform with a known interval length but endpoints depending on some unobserved value drawn from the normal.

\begin{align} y_{mid} \mid \mu &\sim U(\mu - k, \mu + k)\\ \mu &\sim N(\mu^\star, \sigma) \end{align}

Topic		Replies	Views
Debug model with interval and right censoring brms	7	1246	February 13, 2020
How does censoring work (in brms)? RStan	1	1930	November 18, 2021
Linear model intercept has unexpectedly small variance Modeling brms	1	50	December 24, 2024
Animal model with right-censored (time-to-event) data brms	2	730	November 16, 2021
Specify rounding error and censoring for response in brms? brms	4	831	August 14, 2024

Sanity check: behavior of residual variance posterior with interval-censored observations

Related topics