Help choosing the appropriate family

Dallak · May 24, 2022, 11:25pm

Dear all,

I’m working on speech data where I measure some acoustic parameters in millisecond, decibel, etc.

In the following model I am trying to model harmonic-to-noise ratio of some speech categories using the default priors.

The data have a bimodal shape and I’m accounting for that in the model. The response variable reneges from -6 to 57 dB.

m1 <- brm(m1 ~ position*voicing*target_vowel+poa+
                           (position*voicing*target_vowel+poa|Filename)+
                           (position|word),
                         data= fric_,
                         family=gaussian(),
                         core=8,
                         control=list(adapt_delta=0.999,max_treedepth=15),
                         seed=1432)

Rplot05

As is seen, the model is not capturing the distribution shape besides it returns the following warning messages:


Warning message:
  In validityMethod(object) :
  The following variables have undefined values:  cor_1[1],The following variables have undefined values:  cor_1[2],The following variables have undefined values:  cor_1[3],The following variables have undefined values:  cor_1[4],The following variables have undefined values:  cor_1[5],The following variables have undefined values:  cor_1[6],The following variables have undefined values:  cor_1[7],The following variables have undefined values:  cor_1[8],The following variables have undefined values:  cor_1[9],The following variables have undefined values:  cor_1[10],The following variables have undefined values:  cor_1[11],The following variables have undefined values:  cor_1[12],The following variables have undefined values:  cor_1[13],The following variables have undefined values:  cor_1[14],The following variables have undefined values:  cor_1[15],The following variables have undefined values:  cor_1[16],The following variables have undefined values:  cor_1[17],The following variables [... truncated]
Warning messages:
  1: In .local(object, ...) :
  some chains had errors; consider specifying chains = 1 to debug
2: In validityMethod(object) :
  The following variables have undefined values:  cor_1[1],The following variables have undefined values:  cor_1[2],The following variables have undefined values:  cor_1[3],The following variables have undefined values:  cor_1[4],The following variables have undefined values:  cor_1[5],The following variables have undefined values:  cor_1[6],The following variables have undefined values:  cor_1[7],The following variables have undefined values:  cor_1[8],The following variables have undefined values:  cor_1[9],The following variables have undefined values:  cor_1[10],The following variables have undefined values:  cor_1[11],The following variables have undefined values:  cor_1[12],The following variables have undefined values:  cor_1[13],The following variables have undefined values:  cor_1[14],The following variables have undefined values:  cor_1[15],The following variables have undefined values:  cor_1[16],The following variables have undefined values:  cor_1[17],The following variables [... truncated]
3: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess 
4: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#tail-ess

I followed the suggestion in the warning message and specified chains = 1. The model return the following warning (and plot).


Warning messages:
  1: The largest R-hat is 1.18, indicating chains have not mixed.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#r-hat 
2: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess 
3: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#tail-ess

Rplot06

Next, I increased the chains from 1 to 2, and this produces the following warning:


Warning message:
  In validityMethod(object) :
  The following variables have undefined values:  cor_1[1],The following variables have undefined values:  cor_1[2],The following variables have undefined values:  cor_1[3],The following variables have undefined values:  cor_1[4],The following variables have undefined values:  cor_1[5],The following variables have undefined values:  cor_1[6],The following variables have undefined values:  cor_1[7],The following variables have undefined values:  cor_1[8],The following variables have undefined values:  cor_1[9],The following variables have undefined values:  cor_1[10],The following variables have undefined values:  cor_1[11],The following variables have undefined values:  cor_1[12],The following variables have undefined values:  cor_1[13],The following variables have undefined values:  cor_1[14],The following variables have undefined values:  cor_1[15],The following variables have undefined values:  cor_1[16],The following variables have undefined values:  cor_1[17],The following variables [... truncated]
Warning messages:
  1: In validityMethod(object) :
  The following variables have undefined values:  cor_1[1],The following variables have undefined values:  cor_1[2],The following variables have undefined values:  cor_1[3],The following variables have undefined values:  cor_1[4],The following variables have undefined values:  cor_1[5],The following variables have undefined values:  cor_1[6],The following variables have undefined values:  cor_1[7],The following variables have undefined values:  cor_1[8],The following variables have undefined values:  cor_1[9],The following variables have undefined values:  cor_1[10],The following variables have undefined values:  cor_1[11],The following variables have undefined values:  cor_1[12],The following variables have undefined values:  cor_1[13],The following variables have undefined values:  cor_1[14],The following variables have undefined values:  cor_1[15],The following variables have undefined values:  cor_1[16],The following variables have undefined values:  cor_1[17],The following variables [... truncated]
2: The largest R-hat is 1.07, indicating chains have not mixed.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#r-hat 
3: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess 
4: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#tail-ess

Could anybody help with this? I’m new to Bayesian analysis and not sure what is going on.
Thank you in advance!

scholz · May 30, 2022, 1:25pm

I think the first step might be to start with a simpler model and then built up from there. My guess would be that the amount of interactions and varying slopes you have might be too much for the data you have to identify.

Try to find a model that converges and then add complexity to it iteratively. Then you can see which part of the model breaks convergence and how the estimations change depending on what you add.

Another thought is to work on the exp scale. If my memory serves me well, dB are on the log scale. Transforming the outcome would allow the use of continuous positive families that might be a better canonical fit than a normal.

Dallak · June 3, 2022, 1:08pm

Thank you for this @scholz!
Could you please elaborate more on what you mean by “… allow the use of continuous positive families that might be a better canonical fit than a normal”? can you give some examples of continuous positive families? I want to give it a try.

franzsf · June 3, 2022, 1:34pm

Lognormal, gamma, weibull might be some examples. Can you transform your data (potentially using a difference reference level?) so all your dB measurements are positive? I.e. not including zero.

Dallak · June 3, 2022, 1:40pm

Thanks @franzsf!
I’ll give a try following your suggestion.

Dallak · June 4, 2022, 3:05pm

Dear all thanks for your help and patience!

Here is the output of lognormal().

Rplot07

and here is the output of weibull()

Rplot08

May I know how you feel about that? Is it good enough to carry on further analysis? Or if you still have other suggestions.

I also tried gamma family but it threw some errors:

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: gamma_lpdf: Inverse scale parameter[3] is -1.51435, but must be > 0!  (in 'model1f31e600ff2_604c73f91ec3daeb524665b471d1fbf5' at line 108)

Thank you again!

franzsf · June 4, 2022, 5:46pm

You may also find the “stat” and “intervals” pp_checks helpful, along with model comparisons like loo.

Edit: you can also use ggplot modifiers with the pp_check, like “+ scale_x_log10” on the density plot.

Dallak · June 7, 2022, 12:09am

Thank you all for your help!

Topic		Replies	Views
Help specifying the appropriate priors General specification	6	507	May 17, 2022
Multi-modal distribution (bounded 0-100) Modeling	7	693	June 25, 2022
Finding appropriate mixture distribution for brms model Modeling fitting-issues , specification , brms	3	1349	June 2, 2022
Assessing Bayesian Beta Regression fit using pp_check Modeling fitting-issues , brms	5	87	April 8, 2025
Model interpretation and making inference General	4	846	August 31, 2022

Help choosing the appropriate family

Related topics