Aic & bic

Nikos_Matsavelas · September 18, 2019, 1:56pm

Can i theoretically calculate the aic & bic from log_likelihood in the generated quantities? (apart from loo &waic)

# load libraries 
library(rstan)
library(loo)
# Liverpool's Data Premier League scores 2018-2019
liverpool = c(4,2,1,2,2,3,1,0,1,4,1,2,3,1,3,4,3,2,4,5,1,1,4,1,1,3,0,5,0,4,2,2,3,2,2,5,3,2)
length(liverpool)
mean(liverpool)
var(liverpool)


# estimating Gamma parameters given the mean and the var of the data 
estGammaParams <- function(mu, var) {
  alpha <- mu^2 / var 
  beta <-  mu  /  var
  return(params = list(alpha = alpha, beta = beta))
}
# gamma parameters to pass in gamma prior 
estGammaParams(2.25,1.125)




# 1st model of Poisson-Gamma Model 
liv ="data {
  int N;
  int y[N];
}parameters {
  real<lower=0> lambda;
}model {  
  lambda ~ gamma(4.5,2);
  y ~ poisson(lambda);
}
generated quantities {
 vector[N] log_lik;
  for (n in 1:N) {
  log_lik[n] = poisson_lpmf(y[N] | lambda);
}} "

# data as a list ALWAYS 
stan.dat <- list(N = length(liverpool),y = liverpool)

# fit the sampling model to HMC 
fit <- stan(model_code  = liv,
             data       = stan.dat,
             iter       = 2000, 
             chains     = 4,
             seed       = 12345,
             control    = list(adapt_delta=0.9))

# print posterior mean of lambda parameter 
print(fit,pars = "lambda")

# plot parameter 
plot(fit,pars = "lambda")

# extract log_lik from posterior samples 
log_lik_1 <- extract_log_lik(fit,
                             parameter_name = "log_lik",
                             merge_chains = FALSE)

N = length(liverpool)
k = 1
deviance = -2*sum(log_lik_1);deviance
Aic = deviance + 2 * k ;Aic
Bic = deviance + log(N) * k;Bic


r_eff1 <- relative_eff(exp(log_lik_1)) 
loo_1 <- loo(log_lik_1, r_eff = r_eff1, cores = 4)

print(loo_1)
plot(loo_1)

Max_Mantei · September 18, 2019, 2:35pm

Hi Nikos!

AIC (and BIC) are not Bayesian quantities. Taken from wikipedia:

Suppose that we have a statistical model of some data. Let k be the number of estimated parameters in the model. Let \hat{L} be the maximum value of the likelihood function for the model. Then the AIC value of the model is the following.[1][2]

\text{AIC} = 2k-2\log(\hat{L})

…as you see, AIC relies on the maximum likelihood estimator. From a Bayesian view the maximum of the likelihood function is not that interesting, as we work with the whole posterior distribution. For model assessment you can use WAIC, or better yet LOO – which you already use! :)

What do you want to use AIC or BIC for? Comparisons with MLE methods?

andrewgelman · September 19, 2019, 8:08pm

I agree that BIC is not properly Bayesian. AIC can be viewed as an approximation to a Bayesian calculation; see here:
http://www.stat.columbia.edu/~gelman/research/published/waic_understand3.pdf
and our followup paper here:
http://www.stat.columbia.edu/~gelman/research/published/loo_stan.pdf

Nikos_Matsavelas · September 19, 2019, 8:12pm

thank you professor for your reply…so it make sense to evaluate AIC and DIC from the posterior samples?
i am aware of waic and loo criteria, my concern is about the other common criteria of penalised likelihood.

andrewgelman · September 19, 2019, 8:14pm

You can do them all and compare. I just follow Aki Vehtari’s advice and think of all these methods (except for BIC) as estimates of out of sample prediction error. I don’t really like BIC for any reason (see various discussions online and in my books for more on this).

Nikos_Matsavelas · September 19, 2019, 8:15pm

once again i am grateful.thank you professor

emiruz · September 24, 2019, 9:39am

You need to know the number of parameters k and the MLE estimate for your model to be able to calculate AIC. MLE is just \arg\max_\theta \{P(X|\theta)\}. It’s not obvious to me how you may do that from generated quantities, but you could use the output of your Stan model (presumably sampled using MCMC), to set the starting values and run an optimizing on the same model. That will get you MAP which is just \arg\max_\theta \sum_i{log (P(x_i\mid \theta))} + log(P(\theta)) (given a constant prior) which is \arg\max_\theta \sum_i{log (P(x_i\mid \theta))} + const which is MLE + const.

So then you can either use MAP instead of MLE or correct remove the const to get the real MLE as you wish. Either way I think it should get you what you want.

Topic		Replies	Views
LOO - CV and other penalised criteria Modeling loo	5	1180	October 6, 2019
Is there any existing function in Rstan to compute DIC, AIC or BIC ？ RStan loo	3	2776	May 13, 2018
AIC, BIC, DIC in Stan Modeling loo	8	7731	September 4, 2018
AIC, WAIC, LOO-CV for the survival joint model in Stan Algorithms techniques , fitting-issues , performance , loo , math	7	1160	June 20, 2021
Four questions about information criteria/cross-validation and HMC in relation to a manuscript review General	12	1237	March 31, 2020

Aic & bic

Related topics