Hello,
I’d like to compare two models that are quite similar to each other, differing only in terms of one interaction. Specifically, I’d like to say if one of the models is significantly better or whether they are both comparable and there is no meaningful difference between them. I have read a few posts about this topic and would like to make sure I understand it correctly and if my approach is the right one. I’m adding the links below.
I have used Bambi and PYMC to specify and fit the model and then Arviz’s compare to get ELPD, ELPD difference and corresponding SE values. Here is an example table from my data:
My understanding is the following:
- If the ELPD difference is less than 4, then the models are comparable and thus not significantly different. This is not my case, as the ELPD difference is 29.15.
- If I want to compare them further, I can take the dSE and multiply it by a z-value that corresponds to a p-value for which I want to test the difference (as in wgoette’s response. So in this case, that would be 8.83*1.96=17.3 which is less than 29.15 and thus there is a significance difference between the models on the 5% significance level looking at two tail test. It’d be multiplied by 1.645 if I wanted to check only one side, which only makes sense here (is it correct to assume a one-tailed test?).
Is this the way to go?
Considering that the function also provides weight
, is it also possible to use that? From the docs:
weight: Relative weight for each model. This can be loosely interpreted as the probability of each model (among the compared model) given the data. By default the uncertainty in the weights estimation is considered using Bayesian bootstrap
Would it be possible to claim that the models are comparable/significantly different based on their probability? If yes, what would be the threshold?
Lastly, what really puzzles me - why can’t I simply look at the distribution of the two ELPDs and check if they overlap? For instance, if the mean of one of them is contained in the SE of the other (which would be the case here)?
Thank you for your help and tips.
Sources: