Communicating the meaning of ELPD on the original scale of measurement?


Working on my understanding of the elpd-loo estimate provided by loo/kfold and came across this paper on information crtieria I’m specifically interested in the section on statistical and practical significance. I have copied the most relevant section below:

For example, consider two models for a survey of n voters in an American election, with one model being completely empty (predicting p = 0.5 for each voter to support either party) and the other correctly assigning probabilities of 0.4 and 0.6 (one way or another) to the voters. Setting aside uncertainties involved in fitting, the expected log predictive probability is log(0.5) =−0.693 per respondent for the first model and 0.6 log(0.6)+0.4 log(0.4) =−0. 673 per respondent for the second model…

If I have a continuous response variable, measured across 30 participants the elpd_loo is the average total elpd for each participant across each simulation correct? So to get the average elpd for an individual I would divide by the number of 30, so say that leaves me with a per participant difference in elpd of 1.5 between two models. Is it possible to express this within terms of the original scale? If I just take exp(1.5) = 4.48 is that at all an estimate of the difference in estimated predictive error between the two models in terms of the original scale?



It would be easier to give some recommendation if you can tell what is your measurement and also what kind of error measure you would like to use? For continuous response variables, it’s common to look at the root mean square error, mean absolute error, mean relative error, or some quantile of absolute error distribution. loo package has now E_loo functions which help to compute some of these, but we are stilling missing examples.


Thanks, and sorry for being so vague. My response variable is a continuous scale ranging from 0-20. What I am hoping to find is a measure of predictive error that translates in a statement roughly equivalent to “model x has an estimated out of sample predictive error +/- x.” The goal is to make it easier to determine when a model fit showing clinically significant improvement. My guess is RMSE sounds the most right?


Only if you think it’s interpretable and you know what values of RMSE are good or bad. Some people find MAE easier, and in one case I worked the application experts preferred 90% quantile of absolute error (ie, in 90% of cases the error is smaller than the given value).

Assuming variable preds has draws from the posterior predictive distribution and log_lik is the same as is used for elpd, then you can compute LOO RMSE as

# LOO predictive means