Zero-one inflated beta as "censored" beta?

#1

This is an intellectual musing; not one I am currently in need of.

I finally went through @matti’s excellent tutorial on zero-one-inflated Beta models for VAS scales: https://vuorre.netlify.com/post/2019/02/18/analyze-analog-scale-ratings-with-zero-one-inflated-beta-models/

It left me wondering if the rate of extreme ratings (0s and 1s) could be though of as a “catch all more extreme than this threshold” just like the extreme categories in ordinal regression? I find it likely that participants map the VAS response option to a narrow region on the Beta distribution, e.g., [0.2, 0.7], and everything more extreme than that “rounded in” to the nearest response option (here [0, 0.2[ becomes 0).

That way, maybe the model could be parametarized as the Beta parameters + thresholds. I.e.:

  • precision,
  • mean [0, 1],
  • lower threshold [0, upper]
  • upper threshold [lower, 1].

Since the Beta parameters are generative of ALL data in this model, these parameters should be estimated with greater precision, and perhaps even greater validity. As I understand it, the ZOIB “discards” 0s and 1s as a different process, but I think that a whole bunch of 1s should actually be taken as evidence that the beta-mean is higher.

So I’m wondering (1) if this is even possible to model, and (2) is there already a way to model it? If so, this would def be worth a tutorial, and perhaps a paper! It could be generalized to much more than beta.

1 Like

#2

I have had similar thoughts about the extremes in [0, 1] scales, i.e. that we should not model them as separate processes, or at least as being affected by the same mean parameter.

My work around in a current project where responses were on a visual analog scale between 0 and 100 (with a lot of 0s and 100s and multimodality) is to simply model them as ordinal with 20 categories assuming that subjects can’t properly differentiate within an 5 point interval (out of a total of 100 points) anyway. This is surely not the most principled way of handling this (and will not apply to all kinds of beta distributed responses), though.

Another option is to use the overall mean parameterization of zero-inflated and hurdle models (https://github.com/paul-buerkner/brms/issues/641) which should generalize to zero-one-inflated beta models as well.

Anyway, I would be very interested in hearing other ideas about this issue.

1 Like

#3

I’m glad someone is bringing this topic up. I’ve been working on simulations and analyses of large datasets of visual analog scale (VAS) ratings with the aim of evaluating different models of these types of data.

My conclusion so far is similar to what Paul is suggesting; while the ZOIB clearly “fits” the data better than a gaussian model, in practice we may nevertheless usually opt to use the gaussian because it doesn’t separate the edge responses into a separate process. (I am happy to share what I have so far if anyone is interested.) One way to think of this is in terms of statistical power (I know…): It is possible to construct situations where ZOIB wins, but in many realistic scenarios, a gaussian model will have greater power just because all responses count toward the parameters of interest. A normal model is also much easier to interpret, which matters in my opinion.

However, I’m not greatly in favor of Jonas’s suggestion about modeling the data with thresholds for extreme categories. The data that I’m familiar with doesn’t seem to favor that interpretation. e.g. participants tend to be more careful in selecting values near the edges than near the middle of the scale (i.e. the distance between a .97 and .98 rating is more meaningful than between .57 and .58.)

Paul’s approach seems more useful, but then also has the limitations that he points out. For example, the ratings may have a resolution of 1000 points on the scale, and that’s just too many intercepts. (And I would not prefer to cut things up post hoc to reduce the number of scales–although this is common practice and usually seems to work OK.)

Having said all that, the “marginal” model approach (https://github.com/paul-buerkner/brms/issues/641) looks very interesting, especially if it works for zero and one inflation.

1 Like

#4

Power will only be a sensible measure if the model does not exceed the nominal type 1 error rate. I am not sure how the normal models behaves in that respect when applied to ZOIB models. I think the “marginal” approach has indeed some potential and it is worth investigating all the alternatives in more detail, once we have marginal approach up and running in brms.

0 Likes

#5

Thanks for your thoughts! I really appreciate your work on this and see the points you’re making. I’ll await the upcoming work and revisit it later.

0 Likes

#6

Have you thought about using the beta-binomial model? This would involve rescaling the values to integers zero to M, where M is the maximum score (here, M would also determine the resolution at which you think the VAS can measure).

The advantage of the beta-binomial model is that extreme responses (0, M) will not lead to infinite likelihoods as in the beta distribution. The disadvantage is that it is much slower than a normal distribution. I did some simulations to figure out if it matters if one uses a beta-binomial model or a normal model, and it looked as if the beta-binomial model was only clearly better if the responses were very skewed.
Here are the simulations (the topic of this document is modeling sum-scores, which can also be seen as a result of a beta-binomial process): http://htmlpreview.github.io/?https://github.com/gbiele/sumscores/blob/master/AnalyseSumscores.html

0 Likes

#7

Hi Guido, thanks for the suggestion. I have not considered the beta binomial before, but I think it can make sense and will certainly take a look at it. (And thanks for sharing the simulations!)

0 Likes