Posterior predictive tail thicker than observed

potash · May 13, 2019, 9:12pm

I am modeling a positive outcome using a lognormal regression. The outcomes are rounded to integers so I am using the usual interval censoring approach. Here are my outcomes (about 800 observations here):

I am including about a dozen covariates and several random effects but I always get a similar posterior predictive:

For me y \geq 10 is an important threshold so here’s the distribution of proportion y \geq 10 in the replicated datasets and a vertical line for the observed:

Of course I could model the binary outcome y \geq 10 directly but that seems extreme since the threshold, while important is somewhat arbitrary, and I am still interested in the continuous outcome. I was looking forward to being able to make full use of the information through interval censoring.

Note I’ve also tried skew normal (posterior for \alpha was 0\pm .05), gamma, and negative binomial family regressions and the results were very similar. I also tried not including the censoring in the model-- no change. I tried including random effects for a very fine (400 groups for 800 observations) multilevel structure guessing that it might overfit but at least reproduce the tail, but that didn’t work either.

My first thought is that I’m missing the right predictors but I didn’t see any improvement in the posterior predictive as I continue to improve the model (as measured by loo and bayes_R2) so I’m not sure if that’s right? Is there a more flexible family I should be using? Different priors? Anyone ever encountered this before or otherwise have some ideas about what might help (assuming this is an issue)? I realize this might be difficult to answer without seeing the data, which I can’t share.

bbbales2 · May 14, 2019, 9:21pm

You could get heavier tails with a student-t model: https://mc-stan.org/docs/2_19/functions-reference/student-t-distribution.html

I guess you could estimate the degrees of freedom term. \nu \to \infty corresponds to a normal distribution.

You might have to put pretty constraining priors on \nu to get it to work well. gamma functions are involved and the numerics are nasty there.

You could also try just picking a low \nu (like 4 or something random?) and seeing if the fit changes in any interesting way.

potash · May 20, 2019, 8:37pm

Thanks @bbbales2 but I think you may have misread the question as I was seeking thinner tails. Following advice from @James_Savage I tried the Gumbel distribution whose fit did have slightly thinner tails initially, but were thicker farther out, producing unrealistically large values in replications (> 1000, when the max in my data is 100).

bbbales2 · May 20, 2019, 11:45pm

Oh whoops, my apologies. I’m not so sure what to do.

How far off from a normal do your predictions look on the log scale?

Topic		Replies	Views
Calibration plot for predictive distribution, PPC General	3	978	May 31, 2019
Use posterior predictive distribution in another model Modeling	5	921	March 25, 2023
Prediction Intervals General	8	2364	October 14, 2019
Posterior predictive p-values---how to present multiple comparisons and how to deal with 0? Modeling	0	361	February 10, 2022
Use Posterior_predict in rstanarm to generate probabilites for each observation in a logistic regression model Modeling	6	1146	February 22, 2019

Posterior predictive tail thicker than observed

Related Topics