Comparing different functional forms of the same outcome variable using LOO-CV: Follow-up questions

llewmills · September 10, 2024, 7:57pm

I have been going through @avehtari’s excellent notebook https://users.aalto.fi/~ave/casestudies/Nabiximols/nabiximols.html which he created in response to my question on the Stan Forums https://discourse.mc-stan.org/t/comparing-models-with-different-functional-forms-of-the-same-y-variable-using-loo-compare/34161/14 where I enquired about comparing continuous vs discrete observation models in my data estimating difference in frequency of illicit cannabis use - measured in self-reported days of use over the previous 28 days - between nabiximols (a THC agonist) and placebo over a 12-week study period. I analysed the data as a hierarchical model in brms, once treating the outcome as a numeric variable and once as a binomial, and wanted to compare the models using LOO-CV.

In section 4 of the notebook he says that although we “…can’t compare probabilities and densities directly…we can discretize the density to get probabilities. As the outcomes are integers (0,1,2,…,28), we can compute probabilities for intervals ((−0.5,0.5),(0.5,1.5),(1.5,2.5),…,(27.5,28.5))” and then “…integrate the density over the interval. For example, integrate predictive density from 11.5 to 12.5 to get a probability that 11.5<cu<12.5” (where cu is the outcome variable, days of illicit cannabis use in previous 28 days).

Now the crucial part. He says “Now the probability of each interval is approximated by the height times the width of a bar. The height is the density in the middle of the interval and width of the bar is 1, and thus the probability value is the same as the density value! In this case this is simple as the counts are integers and the distance between counts is 1”

So, what I need help firstly with is understanding this procedure conceptually. When you use the the loo_compare() function in brms to compare the models it throws the following warning message

Warning message:
Not all models have the same y variable. ('yhash' attributes do not match)

From what I gather, the procedure I described above, of discretizing the probability density by integrating over intervals 1-unit wide, thus creating a probability from a probability density, allows us to assess whether the comparison of the two models via LOO-CV is valid, and thus whether we can ignore the warning thrown by the loo_compare function. Is this correct?

Second. From the reasoning in the notebook it seems to me that you could discretize any count variable - i.e. where the outcome is an integer - in the same way, by creating intervals 1-unit wide, thereby making the probability value and density value equal. And thus, most (all?) count variables should allow for comparison of continuous/gaussian and discrete/binomial models via LOO-CV. Is this correct?

Third. Could you replace the comparison of these two models via LOO (e.g. with the loo_compare() function) with comparison via k-fold CV?

avehtari · September 11, 2024, 12:44pm

Yes.

Yes, as long as you don’t scale or otherwise transform the counts. I have a draft of another case study which shows how to do the discretization if transformations are used (e.g. using normal distribution for square root of counts)

Yes

llewmills · September 11, 2024, 6:14pm

Thank you so much again Aki. There is a section in the notebook where you do compare a Gaussian model using the scaled version of the outcome to the model using the unscaled version, but not to the binomial. Keen to see the analysis you mentioned.

avehtari · September 17, 2024, 5:55pm

There is nothing special adding the binomial model to the comparison there as long using the non-scaled normal or the scaled with Jacobian adjustment.

There is now a version of Worldcup case study, which illustrates rounding models and also a more complicated discretization of sqrt-normal model, and compares them to discrete count models.

Topic		Replies	Views
Comparing models with different functional forms of the same y variable using loo_compare Modeling brms	15	522	February 28, 2024
Loo to compare ordered vs continuous response models brms loo	2	559	November 14, 2018
Is it possible to compare a gaussian model to an ordinal model using loo-cv Modeling brms	2	108	October 7, 2024
Confusing results when using LOO to compare spline models Modeling loo	6	898	July 10, 2019
Model comparison between SEM and non-SEM models brms	2	1733	June 25, 2018

Comparing different functional forms of the same outcome variable using LOO-CV: Follow-up questions

Related topics