WAIC for any stanfit object

For a stanfit object fit, I use the following code to calculate WAIC,

   log_lik <- extract(fit)$lp__
  # lppd <-sum(log(colMeans(exp(log_lik))))
  lppd <-sum(log(mean(exp(log_lik))))
  p_waic <- sum(stats::var(log_lik))
  waic <- -2*lppd +2*p_waic

where fit is created with target formulation in a stan file.

Is it correct? Or is there any existing function to calculate WAIC for any stanfit object?

1 Like

Once you have your log_lik matrix, you can call loo::waic(log_lik), and the loo package will do the work for you.

1 Like

Thank you I can do. (may be something wrong since NA appears.)
> class(fitt)
[1] “stanfit”
attr(,“package”)
[1] “rstan”
> log_lik1 <- extract_log_lik(fitt,parameter_name = “A”)
> waic1 <- loo::waic(log_lik1)
> waic1

Computed from 889 by 1 log-likelihood matrix

          Estimate SE
elpd_waic      0.5 NA
p_waic         0.0 NA
waic          -1.1 NA

Is the log-likelihood computed in the stan model? If so, then the corresponding parameter name should be passed to extract_log_lik (a name such as A is not particularly informative). If not, you need to compute it in the generated quantities block, with something like this if it’s a linear regression model (adapt it for different models and for variable names you use):

generated quantities {
  vector[N] log_lik;
  for (n in 1:N) {
    log_lik[n] = normal_lpdf(yt[n] | X[n] * beta, sigma);
  }
}
1 Like

Thank you for letting me know my misunderstand. I was wrong. I understand that I should substitute log-likelihood to the variable parameter_name in the function loo::extract_log_lik(), namely, I should substitute \{\log (f(y|\theta_i)\pi(\theta_i)) + \text{const};i=1,2,...,n\} where \{\theta_1,\theta_2,....\theta_n\} is a collection of posterior MCMC samples and f is a likelihood and \pi is a prior and y is a data-set.

In my model, dataset is a collection of non-negative integers y=\{(H_c,F_c) \in \mathbb{N} \times \mathbb{N};c=1,2,...,5\}, which is distributed by

H_c \sim \text{Binomial}(p_c(\theta),N)\\ F_c \sim \text{Poisson} (q_c(\theta))

Thus, in my case, such log_likelihood will be as follows:

\sum_c binomial_lpdf(H_c|p_c(\theta)) + \sum_c poisson_lpmf (F_c|q_c(\theta))(+ const?)

But I think Stan function sampling() already calculates such log likelihood and we can extract by extract(fit)$lp__.

Do I need to write codes in a generated quantities block to calculate log likelihoods for each posterior MCMC samples?

Is it wrong to use extract(fit)$lp__ as the variable parameter_name in the function extract_log_lik?

lp__ is the log-posterior density up to a constant, and by extracting it you get one value per MCMC sample. But what you want is the log-likelihood of each observation at each MCMC sample, that’s why you need to compute it by hand.

I think your log-likelihood formulation is correct, now you need to express the maths in the stan model (all constants will be computed correctly when using the nomal_lpdf function and friends).

1 Like

I understand that lp__ retains log likelihood values \{\sum_i L(y_i|\theta_k)\}_{ k=1,2,...}, but, the code of generated block should be write to provide \{ L(y_i|\theta_j)\}_{i, j=1,2,...} for each data y_i and MCMC samples \theta_j.

And “computed correctly” means the target+= as follows:

          target += binomial_lpmf( h[n]  |    ,     );
          target += poisson_lpmf(  f[n]  |     ); 

instead

                               h[n] ~ binomial(       );
                               f[n] ~ poisson(         );  

Thank you for letting me know.

1 Like