I have been working on a project analysing data from routinely-collected outcomes data from clients receiving opioid agonist treatment (methadone, buprenorphine) in Australian public drug and alcohol clinics. I am examining the effect of level of amphetamine use (in days used) in the 28 days prior to starting treatment on substance use and quality of life in the first year of treatment. i have been using brms, and following developed by @avehtari (for which I am eternally grateful, see here). I have been posting on the forum quite a lot (here and here and here and here).
In that workflow Professor Vehtari explains how you can compare a Gaussian regression - where the outcome is treated as numeric - to a binomial or beta-binomial regression - where the outcome is treated as a (discrete) bounded count - using loo-cv, as long as the width of the interval of each possible count is 1 (i.e. because then the posterior predictive probability and posterior predictive density will be equivalent). If you use loo-cv to compare two models where the outcome is expressed in different functional forms - number vs count - using the loo_compare()
function, saying
Warning message:
Not all models have the same y variable. ('yhash' attributes do not match)
But Professor Vehtari explains why, in this special case, this warning can be ignored and you can legitimately make inferences about which model explains the data better (obviously with lots of posterior and prior predictive checking).
Using the same dataset I have been analysing quality of life outcomes, seeing whether amphetamine use at start of treatment affects trajectory of quality of life over the first year of treatment. This time the outcome is a score from 0-10, where higher score indicates better psychological health. The response scale is an integer (i.e. people cannot indicate fractions between whole numbers. I was wondering if, using the same logic as laid out in Professor Vehtari’s notebook, one could compare a gaussian model, where the outcome is treated as a number, to a cumulative link model, where the outcome is treated as an ordinal categorical (with 1 added to each score so that the response scale goes from 1-11 rather than 0-10)?
I ran the two models in brms and then compared them using loo_compare()
and got the following output.
elpd_diff se_diff
fit_ordinal_psych 0.0 0.0
fit_gaussian_psych -103.4 13.1
Accompanied by the same warning I listed above, that the models have different y variables. I looks like the ordinal performs better on this metric than the gaussian.
Can I ignore the warning?