I’ve inherited some count data of self-reported behaviors. The response format was limited in that the lowest count one could endorse was 1 and the highest count one could endorse was 6. The way I see it, this implies each 1 is a mixture of 1s and 0s and each 6 is a mixture of 6s and above. I believe these could be considered censored count data and I’d like to model them with the Poisson likelihood.
Here’s a simplified version of the data. The y values are the true counts. The y_cens values are the counts after censoring as described, above.
fit <-
brm(data = s,
family = poisson,
y_cens | cens(censored) ~ 1)
The model output seems to looks good.
print(fit)
Family: poisson
Links: mu = log
Formula: y_cens | cens(censored) ~ 1
Data: s (Number of observations: 1000)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.11 0.02 1.07 1.14 1.00 1649 2219
Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
is a crude measure of effective sample size, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Does it appear I’ve correctly interpreted how to work with censored count data?
This looks a bit weird I would expect it to be ifelse(y <= 1, "left", ifelse(y >= 6, "right", "none")))
since you would not know which ones and sixes are “real” and which are censored and since values y == 0 were also censored. This might actually be the reason why your estimate is slightly biased upwards from log(3) = 1.098... despite the large data size.
Other than that it looks correct to me. Also testing your intuitions on a small model with simulated data is a good practice!
Thanks for the suggestion, @martinmodrak. I tried your coding method out. Here’s the code and subsequent model.
# add alternative censoring variable
s <-
s %>%
mutate(c2 = ifelse(y <= 1, "left", ifelse(y >= 6, "right", "none")))
# fit the model
fit2 <-
brm(data = s,
family = poisson,
y_cens | cens(c2) ~ 1)
# check the summary
print(fit2)
Family: poisson
Links: mu = log
Formula: y_cens | cens(c2) ~ 1
Data: s (Number of observations: 1000)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.13 0.02 1.10 1.17 1.00 1634 2132
Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
is a crude measure of effective sample size, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The updated intercept appears more biased relative to the original y data and the idealized lambda = 3.
Sorry for taking long to get back. I was mistaken in my initial evaluation, because brms treats lower bounds as exclusive and upper bounds as inclusive (when censoring), sorry for the confusion. So what I see as correct would be: