Use Posterior_predict in rstanarm to generate probabilites for each observation in a logistic regression model

Say I fit a logistic regression model

m1<-stan_glm(response~predictor1 +predictor2,family="binomial"). 

I would like to generate probabilities for each response. Using posterior_predict

preds<-posterior_predict(m1)

I generate a response 1 or 0, for each observation from 4000 draws(the number of posterior draws in my case).
Can I then take a proportion of 1’s in the 4000 draws and use this as my probability measure?

Eventually, I would like to generate a ROC curve and calculate PPV and NPV

In this particular case, it is better to use

mu <- posterior_linpred(m1, transform = TRUE)

to get the posterior distribution of the conditional mean, rather than to get the predictive distribution and average over it. They are equivalent in principle, but with a finite number of draws, the latter can be noisy.

I would use log predictive density to measure predictive accuracy rather than all of that ROC stuff, but whatever you are doing with it, it is better to not use excessively noisy inputs.

1 Like

Thanks!

I would love to learn more about how to use the log predictive density to measure accuracy. Could you point me to a good reference?

Thank you! Very interesting.

I had another question on the posterior_linpred. I get a probability on each observation for each of the draws. If I use the 2.5 percerntile and 97.5 percentile as lower and upper bounds, would these be 95% prediction intervals or 95% credible intervals?

Thanks!!

credible intervals

Thanks!