Aic & bic

Can i theoretically calculate the aic & bic from log_likelihood in the generated quantities? (apart from loo &waic)

# load libraries 
library(rstan)
library(loo)
# Liverpool's Data Premier League scores 2018-2019
liverpool = c(4,2,1,2,2,3,1,0,1,4,1,2,3,1,3,4,3,2,4,5,1,1,4,1,1,3,0,5,0,4,2,2,3,2,2,5,3,2)
length(liverpool)
mean(liverpool)
var(liverpool)


# estimating Gamma parameters given the mean and the var of the data 
estGammaParams <- function(mu, var) {
  alpha <- mu^2 / var 
  beta <-  mu  /  var
  return(params = list(alpha = alpha, beta = beta))
}
# gamma parameters to pass in gamma prior 
estGammaParams(2.25,1.125)




# 1st model of Poisson-Gamma Model 
liv ="data {
  int N;
  int y[N];
}parameters {
  real<lower=0> lambda;
}model {  
  lambda ~ gamma(4.5,2);
  y ~ poisson(lambda);
}
generated quantities {
 vector[N] log_lik;
  for (n in 1:N) {
  log_lik[n] = poisson_lpmf(y[N] | lambda);
}} "

# data as a list ALWAYS 
stan.dat <- list(N = length(liverpool),y = liverpool)

# fit the sampling model to HMC 
fit <- stan(model_code  = liv,
             data       = stan.dat,
             iter       = 2000, 
             chains     = 4,
             seed       = 12345,
             control    = list(adapt_delta=0.9))

# print posterior mean of lambda parameter 
print(fit,pars = "lambda")

# plot parameter 
plot(fit,pars = "lambda")

# extract log_lik from posterior samples 
log_lik_1 <- extract_log_lik(fit,
                             parameter_name = "log_lik",
                             merge_chains = FALSE)

N = length(liverpool)
k = 1
deviance = -2*sum(log_lik_1);deviance
Aic = deviance + 2 * k ;Aic
Bic = deviance + log(N) * k;Bic


r_eff1 <- relative_eff(exp(log_lik_1)) 
loo_1 <- loo(log_lik_1, r_eff = r_eff1, cores = 4)

print(loo_1)
plot(loo_1)

Hi Nikos!

AIC (and BIC) are not Bayesian quantities. Taken from wikipedia:

Suppose that we have a statistical model of some data. Let k be the number of estimated parameters in the model. Let \hat{L} be the maximum value of the likelihood function for the model. Then the AIC value of the model is the following.[1][2]

\text{AIC} = 2k-2\log(\hat{L})

…as you see, AIC relies on the maximum likelihood estimator. From a Bayesian view the maximum of the likelihood function is not that interesting, as we work with the whole posterior distribution. For model assessment you can use WAIC, or better yet LOO – which you already use! :)

What do you want to use AIC or BIC for? Comparisons with MLE methods?

4 Likes

I agree that BIC is not properly Bayesian. AIC can be viewed as an approximation to a Bayesian calculation; see here:
http://www.stat.columbia.edu/~gelman/research/published/waic_understand3.pdf
and our followup paper here:
http://www.stat.columbia.edu/~gelman/research/published/loo_stan.pdf

3 Likes

thank you professor for your reply…so it make sense to evaluate AIC and DIC from the posterior samples?
i am aware of waic and loo criteria, my concern is about the other common criteria of penalised likelihood.

You can do them all and compare. I just follow Aki Vehtari’s advice and think of all these methods (except for BIC) as estimates of out of sample prediction error. I don’t really like BIC for any reason (see various discussions online and in my books for more on this).

once again i am grateful.thank you professor

You need to know the number of parameters k and the MLE estimate for your model to be able to calculate AIC. MLE is just \arg\max_\theta \{P(X|\theta)\}. It’s not obvious to me how you may do that from generated quantities, but you could use the output of your Stan model (presumably sampled using MCMC), to set the starting values and run an optimizing on the same model. That will get you MAP which is just \arg\max_\theta \sum_i{log (P(x_i\mid \theta))} + log(P(\theta)) (given a constant prior) which is \arg\max_\theta \sum_i{log (P(x_i\mid \theta))} + const which is MLE + const.

So then you can either use MAP instead of MLE or correct remove the const to get the real MLE as you wish. Either way I think it should get you what you want.