Testing against holdout data

I am using a train/test for a binomial three-level model in rstanarm (obs in babies in pregnancies). If I were to compare two models, am I correct that the process in rstanarm would be to use:

log_lik(train, newdata = test)

Average the results by column, add them by row, and then compare the two models to each other?

To get it to match up with the elpd_loo estimate, I think you would just do mean(rowSums(log_lik(train, newdata = test)) / nrow(model.frame(train)))

Great, this makes a lot of sense. Thank you.

I think log_sum_exp should be used