Help choosing the appropriate family

Dear all,

I’m working on speech data where I measure some acoustic parameters in millisecond, decibel, etc.

In the following model I am trying to model harmonic-to-noise ratio of some speech categories using the default priors.

The data have a bimodal shape and I’m accounting for that in the model. The response variable reneges from -6 to 57 dB.

m1 <- brm(m1 ~ position*voicing*target_vowel+poa+
                           (position*voicing*target_vowel+poa|Filename)+
                           (position|word),
                         data= fric_,
                         family=gaussian(),
                         core=8,
                         control=list(adapt_delta=0.999,max_treedepth=15),
                         seed=1432)

Rplot05

As is seen, the model is not capturing the distribution shape besides it returns the following warning messages:


Warning message:
  In validityMethod(object) :
  The following variables have undefined values:  cor_1[1],The following variables have undefined values:  cor_1[2],The following variables have undefined values:  cor_1[3],The following variables have undefined values:  cor_1[4],The following variables have undefined values:  cor_1[5],The following variables have undefined values:  cor_1[6],The following variables have undefined values:  cor_1[7],The following variables have undefined values:  cor_1[8],The following variables have undefined values:  cor_1[9],The following variables have undefined values:  cor_1[10],The following variables have undefined values:  cor_1[11],The following variables have undefined values:  cor_1[12],The following variables have undefined values:  cor_1[13],The following variables have undefined values:  cor_1[14],The following variables have undefined values:  cor_1[15],The following variables have undefined values:  cor_1[16],The following variables have undefined values:  cor_1[17],The following variables [... truncated]
Warning messages:
  1: In .local(object, ...) :
  some chains had errors; consider specifying chains = 1 to debug
2: In validityMethod(object) :
  The following variables have undefined values:  cor_1[1],The following variables have undefined values:  cor_1[2],The following variables have undefined values:  cor_1[3],The following variables have undefined values:  cor_1[4],The following variables have undefined values:  cor_1[5],The following variables have undefined values:  cor_1[6],The following variables have undefined values:  cor_1[7],The following variables have undefined values:  cor_1[8],The following variables have undefined values:  cor_1[9],The following variables have undefined values:  cor_1[10],The following variables have undefined values:  cor_1[11],The following variables have undefined values:  cor_1[12],The following variables have undefined values:  cor_1[13],The following variables have undefined values:  cor_1[14],The following variables have undefined values:  cor_1[15],The following variables have undefined values:  cor_1[16],The following variables have undefined values:  cor_1[17],The following variables [... truncated]
3: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess 
4: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#tail-ess 

I followed the suggestion in the warning message and specified chains = 1. The model return the following warning (and plot).


Warning messages:
  1: The largest R-hat is 1.18, indicating chains have not mixed.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#r-hat 
2: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess 
3: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#tail-ess 


Rplot06

Next, I increased the chains from 1 to 2, and this produces the following warning:


Warning message:
  In validityMethod(object) :
  The following variables have undefined values:  cor_1[1],The following variables have undefined values:  cor_1[2],The following variables have undefined values:  cor_1[3],The following variables have undefined values:  cor_1[4],The following variables have undefined values:  cor_1[5],The following variables have undefined values:  cor_1[6],The following variables have undefined values:  cor_1[7],The following variables have undefined values:  cor_1[8],The following variables have undefined values:  cor_1[9],The following variables have undefined values:  cor_1[10],The following variables have undefined values:  cor_1[11],The following variables have undefined values:  cor_1[12],The following variables have undefined values:  cor_1[13],The following variables have undefined values:  cor_1[14],The following variables have undefined values:  cor_1[15],The following variables have undefined values:  cor_1[16],The following variables have undefined values:  cor_1[17],The following variables [... truncated]
Warning messages:
  1: In validityMethod(object) :
  The following variables have undefined values:  cor_1[1],The following variables have undefined values:  cor_1[2],The following variables have undefined values:  cor_1[3],The following variables have undefined values:  cor_1[4],The following variables have undefined values:  cor_1[5],The following variables have undefined values:  cor_1[6],The following variables have undefined values:  cor_1[7],The following variables have undefined values:  cor_1[8],The following variables have undefined values:  cor_1[9],The following variables have undefined values:  cor_1[10],The following variables have undefined values:  cor_1[11],The following variables have undefined values:  cor_1[12],The following variables have undefined values:  cor_1[13],The following variables have undefined values:  cor_1[14],The following variables have undefined values:  cor_1[15],The following variables have undefined values:  cor_1[16],The following variables have undefined values:  cor_1[17],The following variables [... truncated]
2: The largest R-hat is 1.07, indicating chains have not mixed.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#r-hat 
3: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess 
4: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#tail-ess 



Could anybody help with this? I’m new to Bayesian analysis and not sure what is going on.
Thank you in advance!

I think the first step might be to start with a simpler model and then built up from there. My guess would be that the amount of interactions and varying slopes you have might be too much for the data you have to identify.

Try to find a model that converges and then add complexity to it iteratively. Then you can see which part of the model breaks convergence and how the estimations change depending on what you add.

Another thought is to work on the exp scale. If my memory serves me well, dB are on the log scale. Transforming the outcome would allow the use of continuous positive families that might be a better canonical fit than a normal.

2 Likes

Thank you for this @scholz!
Could you please elaborate more on what you mean by “… allow the use of continuous positive families that might be a better canonical fit than a normal”? can you give some examples of continuous positive families? I want to give it a try.

Lognormal, gamma, weibull might be some examples. Can you transform your data (potentially using a difference reference level?) so all your dB measurements are positive? I.e. not including zero.

1 Like

Thanks @franzsf!
I’ll give a try following your suggestion.

Dear all thanks for your help and patience!

Here is the output of lognormal().

Rplot07

and here is the output of weibull()

Rplot08

May I know how you feel about that? Is it good enough to carry on further analysis? Or if you still have other suggestions.

I also tried gamma family but it threw some errors:

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: gamma_lpdf: Inverse scale parameter[3] is -1.51435, but must be > 0!  (in 'model1f31e600ff2_604c73f91ec3daeb524665b471d1fbf5' at line 108)

Thank you again!

You may also find the “stat” and “intervals” pp_checks helpful, along with model comparisons like loo.

Edit: you can also use ggplot modifiers with the pp_check, like “+ scale_x_log10” on the density plot.

1 Like

Thank you all for your help!