Hi everyone,

I have another question about model interpretation. I have my final models built and thought that I understood how to interpret them but played around a little bit and now I am confused again.

The model I have is this using a beta likelihood with a logit link:

```
EXAM25 ~ 1 + Algorithm + LOC + (1|Project) + (1|Language)
```

Algorithm has 2 levels and LOC is continuous.

`mcmc_areas`

gives this for the logit scale which shows a clear difference between the two algorithms on the logit scale:

`marginal_effects`

shows this picture which looks a lot less certain in the difference, although Linespots seems to have lower EXAM25 than Bugspots:

Now there are two ways to calculate the contrasts that I have seen. One is based on the `posterior_sample`

and one on the `posterior_predict`

functions.

The `posterior_sample`

based one looks like this (for the mean LOC):

```
post = posterior_predict(model)
contrast = inv_logit_scaled(post$b_Intercept) -
inv_logit_scaled(post$b_Intercept + post$b_AlgorithmBugspots )
```

and looks like this:

Again, a clear difference between both algorithms on the outcome scale.

The posterior_predict one looks like this (I have a full factorial design so both subsets look the same besides results and Algorithm):

```
l = posterior_predict(model, newdata = subset(data, d$Algorithm == "Linespots"))
b = posterior_predict(model, newdata = subset(data, d$Algorithm == "Bugspots"))
contrast = l - b
```

This contrast however looks very different from the ones before:

Now I am wondering what is going on here. Is this due to the `posterior_sample`

contrast only looking at mean LOC and mean project and language (as in 0) while the `posterior_predict`

contrast aggregates accross all LOO, project and language values? Or am I doing something else wrong.

I assume that there is no “right” way to do this and it depends on what exactly I want to show as it always seems. However I am not sure what I should conclude from this now.

Would it be fair to say that Linespots has lower EXAM25 for mean LOC (and I guess I could just test for some range of LOC) with differences in projects and languages skewing the results?