Say I fit a logistic regression model
I would like to generate probabilities for each response. Using posterior_predict
I generate a response 1 or 0, for each observation from 4000 draws(the number of posterior draws in my case).
Can I then take a proportion of 1’s in the 4000 draws and use this as my probability measure?
Eventually, I would like to generate a ROC curve and calculate PPV and NPV
In this particular case, it is better to use
mu <- posterior_linpred(m1, transform = TRUE)
to get the posterior distribution of the conditional mean, rather than to get the predictive distribution and average over it. They are equivalent in principle, but with a finite number of draws, the latter can be noisy.
I would use log predictive density to measure predictive accuracy rather than all of that ROC stuff, but whatever you are doing with it, it is better to not use excessively noisy inputs.
I would love to learn more about how to use the log predictive density to measure accuracy. Could you point me to a good reference?
Thank you! Very interesting.
I had another question on the posterior_linpred. I get a probability on each observation for each of the draws. If I use the 2.5 percerntile and 97.5 percentile as lower and upper bounds, would these be 95% prediction intervals or 95% credible intervals?