Different WAIC for the exact same model

Hi everyone!

When I’m running the exact same model several times (via a loop with the same seed), I obtain different WAIC which seems very strange. I would have expected to obtain the same WAIC every times.

To obtain the WAIC, I compute it in the generated quantities block. And then, via the package loo, I extract the value and compute the WAIC like that:

log_lik_1 ← extract_log_lik(fit_1, parameter_name = “log_lik”)
waic(log_lik_1)$waic

I encounter this issue with a complex model but it persists even when I simply it up to a simple non linear model,

Does anyone have any ideas that might explain and resolve my issue, please?

Thank you in advance!

I once had an issue like this. It happened because I generated my own initial values and supplied them via the init argument, and these initial values changed during each model run. Maybe you have a different issue, but you might ensure that the initial values stay fixed.

1 Like

Thank you for the reply!

I do give my own initial values and supply them via the init argument. I don’t give any ramdomness to those value. (I have set a seed for the r part of my code and a seed for the stan part of my code).

How can I ensure that they stay fixed as you suggest?

It sounds like you already ensured that they are fixed. In my case, I simply forgot to look at the initial values and wondered why the seed wasn’t working.

Before I ask you to post your code, you should try to pinpoint where the problem is likely to be: can you confirm that the samples themselves remain the same between runs?

WAIC is a direct function of the parameter values and associated posterior, so the only way they can be different is if the samples somehow are different (or some other bug triggered by the loop iterations).

1 Like

WAIC can vary even with the same seed if there’s randomness in your model or data. Make sure everything is deterministic and check for any non-deterministic processes.

Hi everyone!

First thing first, I would like to thank you all for the reply!

The issue seems to come from the fact that I’m using the within-chain parallelization.
(I don’t have randomness in my data. The only element of “randomness” in my model comes from the use of “random effect” with the non linear mixed effect model. But the issue persists when I use a model with no randomness at all, with a simple non linear model.)

In fact I have one code that doesn’t use the within-chain parallelization, and when I run it several times, with a seed in the r part and a seed in the stan part, I obtain the same results.

However, when I use the code that use the within-chain parallelization. And that I also run it several times, with a seed in the r part and a seed in the stan part, I obtain different WAIC (and also different LOOIC, using the loo package).

When I tried to pinpoint the problem, I saw that across the loop iteration that run the exact same model with the same seed, the first iterations of the same chaines are indeed the same (across loop). But after some iterations of the model, the value start to be a little bit different (across loop).
Could it be like a “round error” of an iteration, that build up with the iterations, explaining the difference in the loo and the waic? Or could it be something else?

Thank you in advance!

Well, there’s your answer, your code and Stan are working as expected, only when you add something else you get some odd behavior. However, it is important to know how odd it really is.

I suggest you check the parameter values themselves, since these criteria are a function of them, and may deviate more from your reference value.

Again, that is something that you could check from the parameter values themselves, if they are a small deviation compared to the HMC-proposed jumps, it may indeed be a rounding error of some sort, due to different parallelization methods; conversely, if they differ considerably, it is likely to be something else – still, these small differences may compound over the long term and make the chains become quite different, you could probably tell the difference between the former and the latter if the magnitude of the differences are small in the beginning and larger later on in the chain, as opposed to larger throughout.

At the risk of repeating myself too much, diagnosing the issue using the deviations in WAIC between your difference chains will be much harder, since it is probably not trivial to determine what are the effect of small deviations on the metric, and you only have a single WAIC value for each run, as opposed to several thousand samples or so.

Thank you for the reply!!!

Indeed, I forgot some details in my description.

I did check the parameter values themselves and their iteration, as recommended. And I already saw the issue without needed to go to the WAIC value.
The small differences in my parameter at the beginning increases over the iteration resulting in quite different final results (“the magnitude of the differences are small in the beginning and larger later”).

I do think that the issue comes from the within chain parallelization (using reduce_sum and partial_sum), but it’s mandatory for me to use it as it enables me to gain quite some times :)
But I would like my code to be reproducible, notably with the use of seeds

I don’t know why that would happen; it’s probably a question for the developers involved in the parallelization functions. It would be nice to understand it. As long as they are not introducing errors, but only some “randomness” arising from numerical fluctuations, it shouldn’t be an issue.

What you can do is look into the results that these functions output, and if they are different from the regular, non-parallelized calculations, that would further narrow down the source of the issue, and allow you to understand if the calculations are ultimately correct.

If the variation between runs is small, that is, within Monte Carlo variation due to randomness in sampling, you should not worry. Also it’s better to use LOO than WAIC see CV-FAQ: How LOO and WAIC are related