Please excuse the long post - my question is in essence: what and how much would you report about an effect, that the loo package doesn’t provide evidence for in the first place? It should ideally be enough to convince a sceptic reviewer that holds the other view dear.
I am working to submit my first paper based purely on Bayesian statistics. The data is experimental, with 200 measurements for each of 23 participants. For this example, let’s say that I am testing whether displaying a word in red or blue letters will influence the accuracy of them recalling that word (bimodal outcome). Assume that both views (influence or not) are equally likely a priori.
Before, I would have created 2 models based on lm4 and unless the complex model had at least a 2 point lower AIC value, I would conclude that the missing factor had no effect on the outcome. This would also typically be an accepted approach in papers. Now I am more confused as to what is appropriate.
My first intuition is to use the loo-package and using the compare_models function between two models either containing the predictor or not. Then to divide “elpd_diff” by “se” and say that any number lower that 2 is insufficient evidence for the effect. The actual results are:
elpd_diff se -0.5 0.8
Indicating the full model is -0.61 standard errors worse than the model without the predictor. This would personally be enough for me to conclude that the burden of showing that the predictor meaningfully influences the outcome, has shifted to those that still believe so. However, I can already hear the classic ‘Absence of evidence is not evidence of absence’ being hurled back at me. ‘-0.61 standard errors worse’ simply doesn’t sound as solid as ‘a lower AIC which can be interpreted as the better model’. It may be a good thing to be less absolute, but it may not help with sceptical reviewers. Even less so, if the model maybe was 1.5 standard errors better, but still below the somewhat arbitrary cutoff of > 2.
I have been considering other options such as Savage-dickey bayes factors, which in this case does show moderate ‘evidence for the null’ (Bf01 of ~ 5.5). However, this is highly influenced by the priors and I think that this may only be better evidence than the loo function in appearance.
Another option is ROPE - region of practical equivalence. Here I find that it is 95% of the posterior inside a region of 3.3 % around 0. Is this better evidence or does it just muddy the waters?
I would really appreciate your input on this one.