I am modeling the effect of a fixed factor group, a positive numeric covariate base, and a random factor site on a positive numeric (i.e. not a count) outcome. Here is the deidentified data
exemplarData.RData (3.5 KB)
The outcome is very skewed.
I saw online that gamma and lognormal regression are quite good for modeling positively skewed numeric data. In this excellent notebook Aki Vehtari demonstrates that when the outcome is a count, you can compare the performance of gaussian models to betabinomial via loo-cv. I was wondering if the same applies to Gaussian, gamma and lognormal regression: if you can compare these models with different famillies and link functions via loo-cv.
Here is the code. Note that gamma regression requires all values to be above zero so I simple added 0.000001 to the zeroes. Not sure if this is kosher (let me know).
library(brms)
load(file = "exemplarData.RData") # exemplarData
# gamma needs values above 0
exemplarData %<>%
mutate(study = case_when(study == 0 ~ 0.000000000001,
TRUE ~ study))
################################
########### Gaussian ###########
################################
# run model
brm(formula = study ~ group + base + (1|site),
data = exemplarData,
family = gaussian,
save_pars = save_pars(all=TRUE),
seed = 1234,
refresh = 0) -> fit_gauss
# add criterion to the object so we can compare models using cross-validation later
fit_gauss <- add_criterion(x = fit_gauss,
criterion="loo",
save_psis = TRUE,
moment_match = T,
reloo = T)
# loo, with refitting.
loo(fit_gauss,
reloo = TRUE,
save_psis = TRUE) -> loo_gauss
##############################
########### Gamma ############
##############################
# run model
brm(formula = study ~ group + base + (1|site),
data = exemplarData,
family = Gamma(link = "log"),
save_pars = save_pars(all=TRUE),
seed = 1234,
refresh = 0) -> fit_gamma
# add criterion
fit_gamma <- add_criterion(x = fit_gamma,
criterion="loo",
save_psis = TRUE,
moment_match = T,
reloo = T)
# loo with refitting
loo(fit_gamma,
reloo = TRUE,
save_psis = TRUE) -> loo_gamma
##############################
########### lognormal ########
##############################
# run model
brm(formula = study ~ group + base + (1|site),
data = exemplarData,
family = lognormal(),
save_pars = save_pars(all=TRUE),
seed = 1234,
refresh = 0) -> fit_lognormal
# add criterion
fit_lognormal <- add_criterion(x = fit_lognormal,
criterion="loo",
save_psis = TRUE,
moment_match = T,
reloo = T)
# loo with refitting.
loo(fit_lognormal,
reloo = TRUE,
save_psis = TRUE) -> loo_lognormal
# now compare the three models
loo_compare(loo_gauss, loo_gamma, loo_lognormal)
# output
# elpd_diff se_diff
# fit_gamma 0.0 0.0
# fit_gauss -98.4 39.1
# fit_lognormal -232.5 13.2
The output suggests the gamma is the superior model. Which brings me back to my question: can one use loo-cv to compare Gaussian to gamma to lognormal regression?
Help much appreciated as always.
