Trying to find WAIC for a simple model from Statistical Rethinking

I have been reading Statistical Rethinking and I want to replicate code 10.23 and code 10.24 in Stan. For the moment, I only care about model m10.6. So I have:

    int<lower=1> N;
    int admit[N];
    int applications[N];
    real male[N];
    real a;
    real bm;
    vector[N] p;
    bm ~ normal( 0 , 10 );
    a ~ normal( 0 , 10 );
    for ( i in 1:N ) {
        p[i] = a + bm * male[i];
    admit ~ binomial_logit( applications , p );
generated quantities{
    vector[N] p;
    real dev;
    dev = 0;
    for ( i in 1:N ) {
        p[i] = a + bm * male[i];
    dev = dev + (-2)*binomial_logit_lpmf( admit | applications , p );

I have extracted this code from rethinking::stancode(m10.6). The command I am using is

fit_admit <- stan(file = 'admit.stan', data = d_copy, iter = 5000, chains = 4)

with d_copy as

d_copy <- list( N = 12,
                admit = c(512,  89, 353,  17, 120, 202, 138, 131,  53, 94,  22,  24),
                applications = c(825, 108, 560,  25, 325, 593, 417, 375, 191, 393, 373, 341),
                male = c(1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0))

Since I want WAIC, I use library(loo) and I do the following


and I get this error

Error in UseMethod("waic") : 
  no applicable method for 'waic' applied to an object of class "stanfit"

then I realize that I need loo::extract_log_lik(), according to this, p.6. And this is my problem, it should be easy, but it is not. The document gives an example:

For example, the following is the generated quantities block for computing and saving the
log-likelihood for a linear regression model with N data points, outcome y, predictor matrix X,
coefficients beta, and standard deviation sigma:
vector[N] log_lik; for (n in 1:N) log_lik[n] = normal_lpdf(y[n] | X[n, ] * beta, sigma);

but I cannot translate that to my model m10.6 (because I don’t understand it). The examples that are shown here should help me, but…

Anyway, my question is simple: What is the expression for log_lik in this case?

By the way, if you really want to know, the WAIC score for m10.6 shown in the book is presumably incorrect, it doesn’t matter if you use map() o map2stan().

The log-likelihood is the same as the deviance without the -2 factor, so you could change the Stan code to output log-likelihoods rather than the deviance. Or you can create the matrix of log-likelihoods in R after the fact with

post <-
ll <- sapply(1:d_copy$N, FUN = function(i) {
  dbinom(d_copy$admit[i], size = d_copy$applications[i], 
         prob = plogis(post$a + post$bm * d_copy$male[i]), log = TRUE)

At which point you can call loo::waic(ll) and it will tell you

11 (91.7%) p_waic estimates greater than 0.4. We recommend trying loo instead. 

The loo package recommends against waic because loo is more robust and outputs more diagnostics. If you call loo::loo(ll)), you get

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     4     33.3%   1396      
 (0.5, 0.7]   (ok)       2     16.7%   407       
   (0.7, 1]   (bad)      2     16.7%   16        
   (1, Inf)   (very bad) 4     33.3%   1         
See help('pareto-k-diagnostic') for details.

which basically means the assumptions for calculating these estimates are not satisfied and you should not use them. With only 12 observations, that is not too surprising but it would be fairly easy to estimate your model 12 times each time leaving out one observation in order to do cross-validation.


Thanks! And that function is easy to understand!

David pointed me here. The issue of how to do WAIC/LOO with binomial models is a subtle one. My rethinking code actually splits these 12 observations into Bernoulli 0/1 outcomes and then does the WAIC/LOO calculation. So the results will differ. If you think the right leave-one-out design is by department/gender combination, then the code in this thread look correct. If instead you think each application should be left out, then my code is (at least in that regard) correct.

This would at least explain why it appears to be only the aggregated binomial models in Chapter 10 that yield conflicting results. Assuming that is the case.

I think there may still be subtle numerical issues with how I do my WAIC calculation, and such an issue would more likely show up in small data sets. So there is perhaps something more to figure out.

Like Richard said, there’s not a right way to do this as it depends on whether you care about leaving out observations or groups of observations. Then there’s also sometimes the complication that groups are unbalanced in your data set but you don’t think they’re so unbalanced in the population. Then even if you leave out one group at a time it can still make sense to apply an additional correction for having different numbers of observations in groups. But it’s totally context dependent whether that makes sense to do.

I thought we had put an example of doing something like this somewhere in the loo package doc or vignettes but I can’t find it now. Definitely an example worth having somewhere though!