In addition to what @jsocolar mentioned about LOO, I wanted to add some things, and maybe he can comment if he has used LOO in the way you describe. I haven’t used PSIS-LOO like this on the holdout data. I have used PSIS-LOO as a tool for model evaluation on the data to which the model was fit, not holdout data. To quote Aki Vehtari in his super helfpul CV FAQ, “Cross-validation is a family of techniques that try to estimate how well a model would predict previously unseen data by using fits of the model to a subset of the data to predict the rest of the data.” So, LOO provides a way to estimate how well the model makes predictions, when all you have is the data that you fit your model to. When you have holdout data, then you could use holdout validation as described here https://cran.r-project.org/web/packages/loo/vignettes/loo2-elpd.html#holdout-validation
Of course there are other metrics, of which accuracy is one. You might be interested in this vignette Bayesian Logistic Regression with rstanarm
Here is something that you may want to think about when considering your accuracy estimate - you are rounding your prediction, when your prediction from predict()
is actually the mean of a collection of samples that are either 0 or 1. The predict()
function returns 0 or 1 for each of the number of samples from your model, with probability of inverse logit of the linear predictors part of the model. The Estimate
in brms predict()
is the mean of this collection of 0 or 1 samples. But take a look at the Est.Error
and you will see how sure the model really is of this Estimate
. It is easier to think about this when using fitted()
which returns the point estimate (mean) of the probability distribution for that prediction. In the case of logistic regression, your full expectation for a single row of new data is technically a collection of samples that are sampled from the probability distribution of the expectation, where the expectation is the probability that y
=1. For a concrete example, imagine that your y
is binary 0 or 1, where 0=susceptible and 1=resistant, for a model where you have some genes as predictors and you are trying to predict antibiotic resistance. You fit your model with 4000 total post-warmup samples. Now you have a single row of new data that you feed into the model and use fitted()
for y
. What comes out is 4000 samples that are probabilities for the new data being y=1. Let’s say you summarize those and get a mean for those samples of 0.55 with 95% quantiles of 0.25 and 0.85. The 0.55 number is the Estimate
that brms gives when summary=TRUE
(the default) and is close to the same number from Estimate
from predict()
that you are rounding in your prediction in your code above (if you want to see all the samples yourself, just use fitted(fit, summary=FALSE)
). The 0.55 point prediction would round up to 1, so you would predict 1, or resistant, but in reality, the model really doesn’t know much - it’s more or less a coin toss, but it doesn’t even know if it is a coin toss either, as the probability is so spread out! It just doesn’t know. If you got something like 0.9 (0.85 - 0.95), then the model is reasonably sure that it is resistant. The point is, that the point estimates may not really be a good reflection of what the model thinks the prediction really is. Depending on the application, it could be more helpful to use fitted
to get the expectation and the variability in that. See the example code below to see examples of all of this.
x <- rnorm(600, 0, 1)
a <- -1
b <- 0.1
p <- plogis(a + b*x)
y <- rbinom(600, 1, p)
d <- cbind.data.frame(y,x)
library(brms)
m <- brm(y ~ 1 + x, family=bernoulli, data=d, cores=4, iter=2000, warmup=1000)
m
x <- rnorm(1, 0, 1)
a <- -1
b <- 0.1
p <- plogis(a + b*x)
y <- rbinom(1, 1, p)
nd <- cbind.data.frame(y,x)
f <- fitted(m, newdata=nd, summary=FALSE)
str(f)
mean(f)
sd(f)
quantile(f, c(0.025, 0.975))
fitted(m, newdata=nd, summary=T)
hist(f, breaks=100)
pr <- predict(m, newdata=nd, summary=FALSE)
str(pr)
mean(pr)
sd(pr)
quantile(pr, c(0.025, 0.975))
predict(m, newdata=nd, summary=T)
hist(pr, breaks=100)
I point all this out, because depending on your application for making predictions for new data, it might be more helpful to not discard the uncertainty in your predictions. Continuing with my example of predicting antibiotic resistance, if your model is ‘very sure’, say maybe something like 0.90 (0.85 - 0.95) that the new row of data should be y=1 resistant, but it turns out in the lab that the sample is actually susceptible, then the observation that your model is surprisingly wrong can inform your model building (maybe your model is just really bad at prediction, or maybe you are missing some genes or something else in the model) or inform you as the scientist of something potentially unusual (if your model is generally and historically pretty good).