I am modeling a positive outcome using a lognormal regression. The outcomes are rounded to integers so I am using the usual interval censoring approach. Here are my outcomes (about 800 observations here):
I am including about a dozen covariates and several random effects but I always get a similar posterior predictive:
For me y \geq 10 is an important threshold so here’s the distribution of proportion y \geq 10 in the replicated datasets and a vertical line for the observed:
Of course I could model the binary outcome y \geq 10 directly but that seems extreme since the threshold, while important is somewhat arbitrary, and I am still interested in the continuous outcome. I was looking forward to being able to make full use of the information through interval censoring.
Note I’ve also tried skew normal (posterior for \alpha was 0\pm .05), gamma, and negative binomial family regressions and the results were very similar. I also tried not including the censoring in the model-- no change. I tried including random effects for a very fine (400 groups for 800 observations) multilevel structure guessing that it might overfit but at least reproduce the tail, but that didn’t work either.
My first thought is that I’m missing the right predictors but I didn’t see any improvement in the posterior predictive as I continue to improve the model (as measured by
bayes_R2) so I’m not sure if that’s right? Is there a more flexible family I should be using? Different priors? Anyone ever encountered this before or otherwise have some ideas about what might help (assuming this is an issue)? I realize this might be difficult to answer without seeing the data, which I can’t share.