Can this method credibly show that ordinal data is not equidistant?

Hi all. I am in a field where ratings, for instance, on a scale from 1 to 5 are often averaged/treated like metric data. I have shared articles from Kruschke about treating ordinal data as metric, but generally, my colleagues don’t seem too impressed by them.

In addition to simply point out that we cannot be sure that the categories are equidistant, I would therefore rather demonstrate this in actual experimental data. I think I have come up with a method of doing so, but is it bulletproof enough, potentially for a publication?

For an example, I treat the wine testing set from the r package ordinal. My argumentation is‌: If a model with flexible thresholds performs at least 2SE’s better (as measured by a loo elpd) than a model with equidistant thresholds, we can credibly show that the data is not equidistant and that it would be an error to treat it as metric.

The full code can be found here.

I define the two models‌ in BRMS:
wineF ← rating ~ temp + (1|judge), family = cumulative(threshold = “flexible”)
wineE ← rating ~ temp + (1|judge), family = cumulative(threshold = “equidistant”)

When visually comparing the models, it seems like the flexible model better captures the data:

However, when I run loo, I cannot confirm this and I should, if anything, prefer the simple model :

  elpd_diff       se_diff
  wineE  0.0       0.0   
  wineF -0.6       1.7

That is, of course, fine here as the data set is rather limited and a test should be able to support either side. However, when I test some ‌data sets from my field, ‌the flexible models are 2-5 SE-Diff better than the equidistant ones.

Is this test enough to credibly show that the data is not equidistant and that taking averages will probably misrepresent the data? If not, I would be very happy to hear what you would do differently.

Thanks for reading.

1 Like

Sorry about not getting to this sooner. This isn’t my area but I can at least bump this up to the top so folks can see it.

Bumping this topic as I still think it is relevant.

This is enough to show that the flexible threshold model performs better than the equidistant models in terms of predictive performance on held-out data, but no more and no less. The conceptual question to ask yourself is “what is the difference between non-equidistant data versus equidistant data and a nonlinear response to temp?” If for your purposes there is no conceptual difference, then this is sufficient to make your point.

Food for thought: an organism’s weight generally scales as its height raised to some power greater than one. That means that it’s a very bad idea to average the weights if the goal is to understand something about the average height. But on the other hand there are plenty of applications where it might be useful to average the weight (for example if you want to estimate how many organisms you can load onto an elevator).

2 Likes