I have been going through @avehtari’s excellent notebook https://users.aalto.fi/~ave/casestudies/Nabiximols/nabiximols.html which he created in response to my question on the Stan Forums https://discourse.mc-stan.org/t/comparing-models-with-different-functional-forms-of-the-same-y-variable-using-loo-compare/34161/14 where I enquired about comparing continuous vs discrete observation models in my data estimating difference in frequency of illicit cannabis use - measured in self-reported days of use over the previous 28 days - between nabiximols (a THC agonist) and placebo over a 12-week study period. I analysed the data as a hierarchical model in brms
, once treating the outcome as a numeric variable and once as a binomial, and wanted to compare the models using LOO-CV.
In section 4 of the notebook he says that although we “…can’t compare probabilities and densities directly…we can discretize the density to get probabilities. As the outcomes are integers (0,1,2,…,28), we can compute probabilities for intervals ((−0.5,0.5),(0.5,1.5),(1.5,2.5),…,(27.5,28.5))” and then “…integrate the density over the interval. For example, integrate predictive density from 11.5 to 12.5 to get a probability that 11.5<cu<12.5” (where cu
is the outcome variable, days of illicit cannabis use in previous 28 days).
Now the crucial part. He says “Now the probability of each interval is approximated by the height times the width of a bar. The height is the density in the middle of the interval and width of the bar is 1, and thus the probability value is the same as the density value! In this case this is simple as the counts are integers and the distance between counts is 1”
So, what I need help firstly with is understanding this procedure conceptually. When you use the the loo_compare()
function in brms
to compare the models it throws the following warning message
Warning message:
Not all models have the same y variable. ('yhash' attributes do not match)
From what I gather, the procedure I described above, of discretizing the probability density by integrating over intervals 1-unit wide, thus creating a probability from a probability density, allows us to assess whether the comparison of the two models via LOO-CV is valid, and thus whether we can ignore the warning thrown by the loo_compare
function. Is this correct?
Second. From the reasoning in the notebook it seems to me that you could discretize any count variable - i.e. where the outcome is an integer - in the same way, by creating intervals 1-unit wide, thereby making the probability value and density value equal. And thus, most (all?) count variables should allow for comparison of continuous/gaussian and discrete/binomial models via LOO-CV. Is this correct?
Third. Could you replace the comparison of these two models via LOO (e.g. with the loo_compare()
function) with comparison via k-fold CV?