LOO - CV and other penalised criteria

creating the vector of pointwise log likelihood in the generated quantities can someone apart from loo evaluation, calculate the other criteria as well?Is that correct?



logisticLoo = "data {
  int<lower=0> N;             
  int<lower=0> P;             
  matrix[N,P] X;              
  int<lower=0,upper=1> y[N];  
}
parameters {
  vector[P] beta;
  real eta;
}
model {
  beta ~ normal(0, 1);
  y ~ bernoulli_logit(X * beta);
}
generated quantities {
  vector[N] log_lik;
  for (n in 1:N) {
    log_lik[n] = bernoulli_logit_lpmf(y[n] | X[n] * beta);}
}"


library(loo)
library(rstan)
library(brms)
library(MASS)

data(birthwt)
head(birthwt)

as.factor(birthwt$low)
X <- model.matrix(~ race+lwt,birthwt)

standata <- list(y =birthwt$low , X = X, N = nrow(X), P = ncol(X))

# Fit model
fit_1 <- stan(model_code = logisticLoo,
              data       = standata,
              control    = list(max_treedepth=15))
print(fit_1)
pars = c("log_lik")
# Extract pointwise log-likelihood and compute AIC,BIC,DIC,LOO
log_lik_1 <- extract_log_lik(fit_1,parameter_name = pars, merge_chains = FALSE)
r_eff <- relative_eff(exp(log_lik_1)) 
loo_1 <- loo(log_lik_1, r_eff = r_eff, cores = 2)
print(loo_1)
nrow(log_lik_1)
pars     =  c("log_lik")
loglik1  =  extract_log_lik(fit_1,pars,merge_chains = FALSE)
dev      =  sum(loglik1)
n        =  nrow(loglik1)
K        =  2
dm       =  3
pm       =  dev - mean(loglik1)  
AIC      = -2  * dev + dm*k  
BIC      = -2  * dev + dm*log(n)
DIC      = -2  * dev + 2 * pm
print(c(dev,AIC,BIC,DIC))

compfunc(fit_1,2)


No, you can’t compute AIC and BIC from the posterior sample returned by Stan, as they need maximum likelihood value. You could compute DIC, but you have error in the computation. You should not need to compute any of these unless you are doing research in historically used criteria. For Bayesian models with Bayesian computation, you should use WAIC or LOO, or even better cross-validation which takes into account your prediction task or decision task. See

Thanks prefessor.I know the error.I found it this morning.I have to take the rowsum of the extracted log_lik_ and then compute the DIC through maximum a posteriori .I know that AIC and BIC need maximum likelihood, however if we take the minimum of the map as a deviance for AIC & BIC we will never be exactly equal to the correct minimum value since MCMC is a sampling, and not an optimization, algorithm but nevertheless, the minimum deviance value obtained from an MCMC output with a large number of generated iterations usually provides sufficiently accurate results. Even more accurate results can be obtained if we use the posterior mean or median obtained from a posterior distribution with a flat prior distribution instead of the maximum-likelihood estimates used in the formal definition of IC (Raftery, 1966).The type that we use is:
IC(m) =D(\hat\theta_{m}, m)+d_{m}F
where \hat\theta, is the posterior mean under model m and F is the regular penalty for AIC and BIC.

Not in higher dimensions. See, e.g., Section 1.3 in https://arxiv.org/pdf/1701.02434.pdf

Only for symmetric posterior distributions. In higher dimensional models the posterior is rarely symmetric.

In social sciences we rarely handle high dimensions of parameter space.The purpose of AIC,BIC and DIC is for variable selection through this point wise log likelihood and that is the reason that i use it.
LOO is FANTASTIC (i use it a lot and thank you for that Dr.Vehtari) but it is for checking the predictive accuracy of its model via this cross validation of the dataset to train and test set.AIC,BIC and DIC answer a specific question:
“What variables must i use in my model?”.In the other hand LOO-CV answer the other question:“Which model has the best predictive accuracy?”.
For me there is no argument.

Please read the references I posted. AIC and DIC answer exactly the same question as LOO. AIC is valid only if you do maximum likelihood with regular (non-singular) model with no strong dependencies between parameters. DIC is valid only if you would do the predictions with posterior mean instead of integrating over the posterior. If you are using Stan to sample from the posterior, there is not any reason, why AIC would be justified and justification for DIC is also that you don’t want to be Bayesian. If you really want to use *IC, use the WAIC which is the only from these having the Bayesian justification.

No, there is no difference between how AIC, DIC, WAIC or LOO would be used to answer these questions. They are all estimating the predictive performance (you can check Akaike’s first information criterion paper mentioning this, too) and all answer these both questions exactly in the same way (except that AIC and DIC are not Bayesian). In addition of theory in A survey of Bayesian predictive methods for model assessment, selection and comparison and Understanding predictive information criteria for Bayesian models | Statistics and Computing saying this, you can also check the experiments in Comparison of Bayesian predictive methods for model selection | Statistics and Computing. When answering either of these questions, you can always replace AIC, DIC and WAIC with LOO and get better results. However, all of these are bad for variable selection if there are many variables as shown in Comparison of Bayesian predictive methods for model selection | Statistics and Computing. See also my talks and case studies at Model selection tutorials and talks about variable selection.

3 Likes