Model comparison questions (performance/accuracy evaluation)


I have a question about how to compare two models and evaluate their accuracy or performance. I have a tweedie model and another zero-inflated gamma model. Both of them I have got the stan code working. But now I am not sure how to compare their performance. My adviser suggests me to (y_i - \hat(y_i))^2 . But I am not sure is there any function I can apply directly? Or I need to add some lines in my stan code. Do you have any other suggestion about the evaluation?

Thank you very much in advance. Hope every one in this forum has a wonderful holiday season.



Look into using the loo package for estimating out of sample predictive error and widely applicable information criteria (WAIC), which I think is the preferred Bayesian approach for model comparison.

Thank you very much for your help, @ScottAlder. I think I would need to add some lines in my stan code to predict the out-of-sample response. Based on the link you sent, they do:

generated quantities {
  vector[N] log_lik;
  for (n in 1:N) {
    log_lik[n] = bernoulli_logit_lpmf(y[n] | X[n] * beta);

I think I should do the similar way.

1 Like

I think I get the point how to estimate the log-likelihood. But I am still not sure if I would like to get the predicted response. What should I do? Is there any function, like ‘predict’ in machine learning? Or I need to code by myself in stan? Thank you very much. @ScottAlder

You need to record the point wise log likelihood in your Stan program so that the loo package in R can approximate the leave one out cross validation. Can you post the model blocks in your two Stan programs? I’m guessing that recording the loglik will be more complicated for zero inflated and tweedie likelihoods.

PSIS-LOO implemented in loo() function in loo package is preferred over WAIC as it has better diagnostics. Neither of these is always preferred. See videos and case studies at Model assesment, selection and inference after selection |

elpd_loo given by loo package is better as it evaluates the whole predictive distribution and not just the mean. Sometimes this can also be useful. See code examples for squared error and R^2 in Model assesment, selection and inference after selection | and Bayesian R2 and LOO-R2
See also loo package documentation for what you need to add to Stan code


Thank you very much for your help, Scott. I had successfully convinced my adviser to apply the log-likelihood rather than the method similar to MSE. And the answer meets out expectation.

1 Like

Thank you very much for your help, Aki. Based on the PSIS-LOO, I get a positive result for my model. I am also trying to apply WAIC as well, since I think I need more than one criteria in my paper. But I am wondering if loo provides this function? Do you have any other suggestion?

I should check the manual before I asked questions. LOO provides WAIC function. Now I am more comfortable with my results. Thank you very much for your help, Aki.

No. WAIC is estimating the same thing as PSIS-LOO, but fails more easily and lacks good diagnostic to detect when it fails. If you have already computed PSIS-LOO there is no additional benefit of reporting WAIC. I recommend instead adding posterior predictive checking analysis as that would be complementary to PSIS-LOO (and WAIC). See, some examples, e.g. in

1 Like

Thank you very much for your help, Aki. I had completed my stan code for tweedie model, and the simulation study is good. But I have problems with the real data. WAIC and loo return unreasonable values. Would you please help me out?

The link is here: NA loo result

Thank you very much for your help. Have a nice holiday season.