Hi everyone,
I have another question about model interpretation. I have my final models built and thought that I understood how to interpret them but played around a little bit and now I am confused again.
The model I have is this using a beta likelihood with a logit link:
EXAM25 ~ 1 + Algorithm + LOC + (1|Project) + (1|Language)
Algorithm has 2 levels and LOC is continuous.
mcmc_areas
gives this for the logit scale which shows a clear difference between the two algorithms on the logit scale:
marginal_effects
shows this picture which looks a lot less certain in the difference, although Linespots seems to have lower EXAM25 than Bugspots:
Now there are two ways to calculate the contrasts that I have seen. One is based on the posterior_sample
and one on the posterior_predict
functions.
The posterior_sample
based one looks like this (for the mean LOC):
post = posterior_predict(model)
contrast = inv_logit_scaled(post$b_Intercept) -
inv_logit_scaled(post$b_Intercept + post$b_AlgorithmBugspots )
and looks like this:
Again, a clear difference between both algorithms on the outcome scale.
The posterior_predict one looks like this (I have a full factorial design so both subsets look the same besides results and Algorithm):
l = posterior_predict(model, newdata = subset(data, d$Algorithm == "Linespots"))
b = posterior_predict(model, newdata = subset(data, d$Algorithm == "Bugspots"))
contrast = l - b
This contrast however looks very different from the ones before:
Now I am wondering what is going on here. Is this due to the posterior_sample
contrast only looking at mean LOC and mean project and language (as in 0) while the posterior_predict
contrast aggregates accross all LOO, project and language values? Or am I doing something else wrong.
I assume that there is no “right” way to do this and it depends on what exactly I want to show as it always seems. However I am not sure what I should conclude from this now.
Would it be fair to say that Linespots has lower EXAM25 for mean LOC (and I guess I could just test for some range of LOC) with differences in projects and languages skewing the results?