I have a model that predicts fault prediction performance using the weighting function, the lines of code the project has at the revision and a varying intercept for each project:
(I found it a lot easier to work with the
0 + ...notation and the models didn’t seem to differ in loo performance)
formula = EXAM ~ 0 + Weight + LOC + (1|Project), family=Beta(), prior = c( prior(normal(0,1), class=Intercept), prior(normal(0,0.5), class=b), prior(cauchy(0,0.5), class=sd), prior(gamma(0.1, 0.1), class=phi) )
There are 3 different weighting functions and 23 projects. LOC is centered and scaled to sd.
The question I want to answer is which weighting function produces the lowest (best) EXAM score and the differences in performance between the three weighting functions.
Here is what I have so far:
stanplot(m3.1, type="areas", pars="b_Weight") I get this plot for the coefficients:
It seems pretty clear from this plot, that there is almost no difference between the three weighting functions.
I can also use
hypothesisfrom brms to check if two coefficients are the same but that again is on the logit scale so maybe the difference on the output scale is different?
- In case the coefficients are “the same” on the logit scale, is that sufficient to declare that there is no relevant difference between them or do I always have to analyze on the outcome scale?
But for the sake of better understanding, let’s assume that there is a difference and I would like to put numbers on it and maybe make a recommendation based on it.
I can calculate the difference between two weighting functions from the posterior sample like this:
inv_logit_scaled(post$w1) - inv_logit_scaled(post$w2)
I could then do that for all three weighting functions, add some 95% lines and plot them like this eg:
We can read from this, that the median (dotted line) difference between W2 and W1 is positive, but the 95% percentiles have a big overlap with 0. So when one would have to choose, w1 might be the better choice but ultimately it kind of doesn’t matter.
Up to this point, I have ignored the loc and varying intercepts for the 23 different projects. I am unsure how to handle them.
As the differences on the outcome scale are not linear with the differences on the logit scale I have to include the entire model when calculating the differences. So for each project, over the span of all loc.
This seems like a lot of work and very hard to summarize in a way that a reader will understand easily.
I found this thread that tackles a similar problem but lacks the varying intercepts so I am not sure how to apply this to my case.
Do I take the mean of all project intercepts or is that mean maybe just 0 by design?
How do I include the varying project intercepts in the “around the mean” analysis of differences in performance?
How would one go about calculating and presenting the differences away from the mean?