Comparing Gaussian, gamma, and lognormal versions of regression model

I am modeling the effect of a fixed factor group, a positive numeric covariate base, and a random factor site on a positive numeric (i.e. not a count) outcome. Here is the deidentified data

exemplarData.RData (3.5 KB)

The outcome is very skewed.

I saw online that gamma and lognormal regression are quite good for modeling positively skewed numeric data. In this excellent notebook Aki Vehtari demonstrates that when the outcome is a count, you can compare the performance of gaussian models to betabinomial via loo-cv. I was wondering if the same applies to Gaussian, gamma and lognormal regression: if you can compare these models with different famillies and link functions via loo-cv.

Here is the code. Note that gamma regression requires all values to be above zero so I simple added 0.000001 to the zeroes. Not sure if this is kosher (let me know).

library(brms)

load(file = "exemplarData.RData") # exemplarData

# gamma needs values above 0

exemplarData %<>%
  mutate(study = case_when(study == 0 ~ 0.000000000001,
                           TRUE ~ study))

################################
########### Gaussian ###########
################################


# run model
brm(formula = study ~ group + base + (1|site),
    data = exemplarData,
    family = gaussian,
    save_pars = save_pars(all=TRUE),
    seed = 1234,
    refresh = 0) -> fit_gauss

# add criterion to the object so we can compare models using cross-validation later
fit_gauss <- add_criterion(x = fit_gauss,
                           criterion="loo",
                           save_psis = TRUE,
                           moment_match = T,
                           reloo = T)

# loo, with refitting.
loo(fit_gauss,
    reloo = TRUE,
    save_psis = TRUE) -> loo_gauss

##############################
########### Gamma ############
##############################

# run model
brm(formula = study ~ group + base + (1|site),
    data = exemplarData, 
    family = Gamma(link = "log"),
    save_pars = save_pars(all=TRUE),
    seed = 1234,
    refresh = 0) -> fit_gamma

# add criterion
fit_gamma <- add_criterion(x = fit_gamma,
                           criterion="loo",
                           save_psis = TRUE,
                           moment_match = T,
                           reloo = T)

# loo with refitting
loo(fit_gamma,
    reloo = TRUE,
    save_psis = TRUE) -> loo_gamma

##############################
########### lognormal ########
##############################

# run model
brm(formula = study ~ group + base + (1|site),
    data = exemplarData,
    family = lognormal(),
    save_pars = save_pars(all=TRUE),
    seed = 1234,
    refresh = 0) -> fit_lognormal

# add criterion 
fit_lognormal <- add_criterion(x = fit_lognormal,
                               criterion="loo",
                               save_psis = TRUE,
                               moment_match = T,
                               reloo = T)

# loo with refitting. 
loo(fit_lognormal,
    reloo = TRUE,
    save_psis = TRUE) -> loo_lognormal

# now compare the three models
loo_compare(loo_gauss, loo_gamma, loo_lognormal)

# output
#            elpd_diff   se_diff
# fit_gamma        0.0       0.0 
# fit_gauss      -98.4      39.1 
# fit_lognormal -232.5      13.2

The output suggests the gamma is the superior model. Which brings me back to my question: can one use loo-cv to compare Gaussian to gamma to lognormal regression?

Help much appreciated as always.

Generally, as long as you are modeling y directly, not a transformation (e.g., \sqrt{y} or \log y) yes you can directly compare predictive densities (Cross-validation FAQ).

brms may automatically check this for you but I’m not familiar enough with the package to say for certain.