I am using the resp_cens() function in brms to account for the right-censoring of my data. In our manuscript, we are providing this information:
“When using this function, censored data is treated as missing data that is constrained to fall in the censored range of values (Stan Development Team, 2021)."
A reviewer has asked to provide statistical/methodological literature to illustrate why this is the best way to handle censored data. Can someone point me towards primary literature?
Thanks!
1 Like
@Lena_Schaefer hello, I believe that the censoring implementation in brms does not in fact behave this way. Instead, the likelihood function includes the censored observations using the cumulative distribution function for whatever your response family is (for a linear model, the normal CDF). You could confirm this using the stancode() function with your brms model.
The Stan manual describes both of those methods (4.3 Censored data | Stan User’s Guide).
I don’t have any primary evidence to hand for this exact question, but I suspect that treating the censored values as missing in this manner would be more reasonable if the observation values were of direct interest, but otherwise integrating them out would be more efficient, especially with a lot of censoring. As for primary literature I’d say it depends on the discipline as to what would be best to cite, and if you’re talking about the handling of censored data in general or in the Bayesian regression context specifically. There’s a lot of discourse about this in pharmacokinetics literature for example, but most of it is presuming frequentist methods.