LOO and WAIC deliver exactly same results?

Hi all,

it would be great if you could have a look at this.

I am dealing with model selection at the moment, so I use the loo package in R to calculate LOOIC and WAIC.
So, what is puzzling me, is that I get the exact same reults for LOOIC and WAIC over all ten models. I know that they are asymptotically equivalent and can deliver similar results, but the exact same values…??

Here are the corresponding outputs for my 10 different models, which I obtain by

log_lik_1 <- extract_log_lik(mod1)
loo_1 <- loo(log_lik_1)
waic_1 <- waic(log_lik_1)

               looic   se_looic elpd_loo se_elpd_loo p_loo   se_p_loo

loo_3 4413.9 19.6 -2207.0 9.8 0.4 0.0
loo_4 4417.0 19.4 -2208.5 9.7 0.5 0.0
loo_1 4417.6 19.5 -2208.8 9.7 0.4 0.0
loo_5 4418.3 19.5 -2209.2 9.7 0.5 0.0
loo_6 4418.9 19.4 -2209.4 9.7 0.5 0.0
loo_7 4419.1 19.4 -2209.5 9.7 0.6 0.0
loo_8 4421.5 19.3 -2210.8 9.7 1.0 0.1
loo_2 4421.9 19.3 -2211.0 9.6 0.5 0.0
loo_9 4422.3 19.3 -2211.2 9.6 0.8 0.1
loo_10 4424.2 19.2 -2212.1 9.6 0.7 0.0

                  waic    se_waic elpd_waic se_elpd_waic p_waic  se_p_waic

waic_3 4413.9 19.6 -2207.0 9.8 0.4 0.0
waic_4 4417.0 19.4 -2208.5 9.7 0.5 0.0
waic_1 4417.6 19.5 -2208.8 9.7 0.4 0.0
waic_5 4418.3 19.5 -2209.2 9.7 0.5 0.0
waic_6 4418.9 19.4 -2209.4 9.7 0.5 0.0
waic_7 4419.1 19.4 -2209.5 9.7 0.6 0.0
waic_8 4421.5 19.3 -2210.8 9.7 1.0 0.1
waic_2 4421.9 19.3 -2211.0 9.6 0.5 0.0
waic_9 4422.3 19.3 -2211.2 9.6 0.8 0.1
waic_10 4424.2 19.2 -2212.1 9.6 0.7 0.0

And here is my example code (for one of the model specifications), in which I obtain the log likelihood:

data {
  int<lower=2> K;
  int<lower=0> N;
  int<lower=1> L;
  int<lower=1,upper=K> CrR[L];
  vector[N] x1;
  vector[N] x2;
  vector[N] x1x2;
  int<lower=1,upper=L> ll[N];
  int<lower=1,upper=N> indices1[L];
  int<lower=1,upper=N> indices2[L];
}
parameters {
  real b_x1;
  real b_x2;
  real b_x1x2;
  ordered[K-1] c;
}
transformed parameters {
  vector[N] theta[K];
  vector[K] THETA[L];
  for (n in 1:N) {
    real eta;
    eta = x1[n]*b_x1 + x2[n]*b_x2 + x1x2[n]*b_x1x2;
    theta[1,n] = 1 - (inv_logit(eta - c[1]));
    for(k in 2:K-1)
      theta[k,n] = (inv_logit(eta - c[k-1])) - (inv_logit(eta - c[k]));
    theta[K,n] = (inv_logit(eta - c[K-1]));
  }
  for (l in 1:L) {
    THETA[l,1] = theta[1,indices1[l]];
    THETA[l,3] = theta[3,indices1[l]];
    for (m in 1:(indices2[l]-indices1[l])) {
      if (m > 0) {
        THETA[l,1] = THETA[l,1] * theta[1,(indices1[l]+m)];
        THETA[l,3] = 1 - ((1-THETA[l,3]) * (1-theta[3,(indices1[l]+m)]));
      }
    }
    THETA[l,2] = 1 - THETA[l,1] - THETA[l,3];
  }
} 
model {
  for (l in 1:L) {
    CrR[l] ~ categorical(THETA[l]); 
  }
}
generated quantities {
  vector[L] log_lik;
  for (l in 1:L) {
    log_lik[l] = categorical_logit_lpmf(CrR[l] | THETA[l]);
  }
}

Additional useful information would be the number of observations. From the loo/waic output we see that the estimated effective number of parameters is less than 1 and you are not learning from data much at all. If the number of observations is high, then it is likely that loo and waic will give the same results up to many digits. This looks like your data is just random noise.

Thanks for your quick reply!
Yeah, I was also already worried regarding the effective number of parameters.
The number of observations is 2152.

With that ratio of p_loo/n you’re safely in the asymptotic region of loo and waic

You should also check your data and models.

Ok, thank you very much for your help!

Unfortunately, I have a follow-up question on this.
I am sorry if this question might be a bit stupid but I am kind of stuck right now.

So, my pareto k estimates are good for all of my models (k < 0.5) and I actually only have between two and five predicotrs in my models (apart from the two cutpoints for the three categories of my ordered logistic model).
Additionally, I did not specify any prior. So, there is no risk that I am putting too much prior information in my parameters. My models are converging (Rhat = 1), intepretation of the results makes sense and the results are robust for different subsets.

So, in which range should the effective number of parameters be, such that I can assume a good model fit? For the case with two predictors, between 1 and 2?
For the case with five predictors between 1 and 5?
And they are never allowed to be below 1?

No specific range, and if you are happy with the predictions there is no problem. There was not enough information in your post to infer the predictive performance easily, and thus I suggested checking data. Out of curiosity, what is the performance without any covariates?

Here ist the output including a model without covariates (loo_nocov):

              looic   se_looic elpd_loo se_elpd_loo p_loo   se_p_loo

loo_3 4413.9 19.6 -2207.0 9.8 0.4 0.0
loo_4 4417.0 19.4 -2208.5 9.7 0.5 0.0
loo_1 4417.6 19.5 -2208.8 9.7 0.4 0.0
loo_5 4418.3 19.5 -2209.2 9.7 0.5 0.0
loo_6 4418.9 19.4 -2209.4 9.7 0.5 0.0
loo_7 4419.1 19.4 -2209.5 9.7 0.6 0.0
loo_8 4421.5 19.3 -2210.8 9.7 1.0 0.1
loo_2 4421.9 19.3 -2211.0 9.6 0.5 0.0
loo_9 4422.3 19.3 -2211.2 9.6 0.8 0.1
loo_10 4424.2 19.2 -2212.1 9.6 0.7 0.0
loo_nocov 4428.3 19.8 -2214.1 9.9 0.2 0.0

Seeing this, the performance might definitely be improved.

Here is the direct comparison of the best performing model (loo_3) and the model without covariates:

compare(loo_3, loo_nocov)
elpd_diff se
-7.2 2.5

What do you think?

It’s better, so the covariates do contain some useful information for prediction.

Ok, that’s something :)
Thank you!