Multi-modal distribution (bounded 0-100)

Dallak · June 7, 2022, 1:49pm

Dear all,
Thanks for your help in advance!
I am working on speech data where I record subjects saying a couple of sentences then I measure some acoustic parameters such as speech duration, speech intensity, etc. In the following case I measures voicing percentage (bounded 0-100). This parameter measures the percentage a segment contains voiced frames; some segments shows 100% means they are completely voiced segment and vice versa.

I tried modelling this using default priors and gaussian family as in:

p_voiced_1 <- brm(
  percentvoiced_f ~ position*voicing*target_vowel+poa+
    (1| Filename) +
    (1| word),
  data = fric_,
  sample_prior = TRUE,
  family = gaussian(),
  cores = 8,
  iter = 1000,
  control = list(adapt_delta = 0.999, max_treedepth = 15),
  seed = 1432)

This produces the following plot.

Rplot118

I also tried other families such as zero_on_inflated_beta() and binomial() but still not working.
Could you please instruct me on what other families or approaches I should try next?
Thanks!

StaffanBetner · June 8, 2022, 10:58am

It seems to me like you have fractions where the denominator is either 1, 2 or 4? If you have the denominator available, you could use a logistic regression with family set to binomial, or beta-binomial (not sure whether it is available in brms now?). Something like nominator | trials(denominator) ~ ....

The normal distribution is not really suitable for bounded (and discrete) data.

Dallak · June 9, 2022, 12:44am

Thanks for this @StaffanBetner!
The research question I am trying to answer can’t by answered by logistic regression. So… other solutions are appreciated.

Many thanks again!

StaffanBetner · June 9, 2022, 8:03am

Why not?

Dallak · June 9, 2022, 9:51pm

My response variable in not binary and I am not looking for a binary answer. It is continuous from 0 to 100 in that I hypothesize that some of the predictors investigated vary in the voicing percentage percentvoiced_f. More specifically, the predictor voicing contains two levels voiced and unvoiced categories . So for voiced category, I predict higher percentage of voicing not 100% but something above 80%. For the other category unvoiced I hypothesize that this may show lower voicing value/percentage around 20% or so. The same is true for the other predictors involved the models they are all categorical with different levels. In addition, this model is part of a series where I just run regular linear regressions which is common in my field so I want to be consistent. I hope this makes it clear.
Thank you again for your time!

simonbrauer · June 21, 2022, 12:45am

It sounds like your raw data are binary at the frame level and binomial at the segment level. So you could either analyze the data at the frame level as bernoulli distributed (with logistic regression) or at the segment level as binomial distributed (with a logit link). Of course, you lose this when you transform the trial to a percentage. But you should be able to model the data as described by @StaffanBetner.

By “regular linear regressions” I assume you mean a gaussian distribution with a linear link? If that is the case, I’m not sure how you could achieve a more compatible posterior predictive check.

Dallak · June 24, 2022, 11:20am

Many thanks all for this. I will give it a go and see how it goes.

saudiwin · June 25, 2022, 8:42pm

You can try my package ordbetareg: CRAN - Package ordbetareg

Topic		Replies	Views
Help choosing the appropriate family Modeling fitting-issues , specification	7	693	June 7, 2022
Difficulty fitting a bounded continuous (gaussian) response brms fitting-issues , specification	7	1042	December 23, 2020
Beta family distribution and posterior predictive checks with continuous outcome variable Modeling brms	4	236	May 10, 2024
Help specifying the appropriate priors General specification	6	506	May 17, 2022
Assessing Bayesian Beta Regression fit using pp_check Modeling fitting-issues , brms	5	81	April 8, 2025

Multi-modal distribution (bounded 0-100)

Related topics