Hello Everyone,
As this is my first post, I hope I am doing a good job asking this question.
I am currently building models for my thesis about fault prediction and after some work, I am now wondering if my choice of a Beta Likelihood is the right one.
Both outcome variables I have are proportions, so they are between 0 and 1. Both are metrics from fault prediction/localization that represent proportions of code.
The limitation between 0 and 1 and the fact that they could be interpreted as probabilities made me go with a Beta Likelihood.
I played around with projpred to get a sense of what predictors to use and then iterated between prior sensitivity analysis and model building. I would build some models, compare them with loo_compare (from brms) and then try to find sensible priors by plotting them with different values.
After arriving at what I thought to be the best model I could come up with, I found the pp_check function from bayesplot and found that my posterior did not really fit the data that well and instead tried a gaussian likelihood with mostly the same priors.
It outperformed the Beta model by a lot.
Since then I figured out, that I could improve the loo performance of the beta model a lot by playing around with the priors but I am now wondering if the choice for the Beta was the right one and if playing around with prior values is “allowed” as long as they sample fine.
I put the shortened code for the first gaussian and beta model below in case it helps understanding what I did.
m.beta= brm(
formula = EXAM ~ 1 + Weighting + LOC + Origin + (1|Project) + (1|Domain) + (1|Language),
data = ls.df,
family=Beta(),
prior = c(
prior(normal(0,10), class=Intercept),
prior(normal(0,0.05), class=b),
prior(cauchy(0,0.05), class=sd),
prior(gamma(10, 10), class=phi)
)
)
m.gauss = brm(
formula = EXAM ~ 1 + Weighting + LOC + Origin + (1|Project) + (1|Domain) + (1|Language),
data = ls.df,
family=gaussian(),
prior = c(
prior(normal(0,1), class=Intercept),
prior(normal(0,0.05), class=b),
prior(cauchy(0,0.05), class=sigma),
prior(cauchy(0,0.05), class=sd)
)
)