Dear all,
I just want to make sure that I fully understand my model. I ran the following model to test how the two levels voicing
capture differences in the response variable. The response variable is standardised and I’m using contr.sum
coding. I’m using the package emmeans
for this to contrast this.
m1 <- brm(intensity_offset.y_z ~ position*voicing*target_vowel+poa+rep+
(1+position*voicing*target_vowel+poa|File.name)+
(1+position|word),
data= dat
...)
Here is the model summary
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -0.26 0.23 -0.72 0.20 1.00 5098 9537
voicing1 0.28 0.07 0.14 0.42 1.00 21990 19843
target_vowel1 0.04 0.09 -0.14 0.22 1.00 23031 21668
target_vowel2 -0.17 0.09 -0.35 0.01 1.00 21877 19885
poa1 0.13 0.09 -0.05 0.30 1.00 23706 20775
poa2 -0.09 0.10 -0.30 0.12 1.00 23112 20642
rep -0.03 0.03 -0.08 0.02 1.00 59711 25886
voicing1:target_vowel1 0.14 0.08 -0.03 0.31 1.00 23040 20713
voicing1:target_vowel2 -0.01 0.08 -0.18 0.16 1.00 24001 21381
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.43 0.01 0.40 0.46 1.00 25981 25172
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
I used the following code to extract the estimate for each level of voicing using emmeans
.
r3 <- emmeans(m1, "voicing")
plot(rope(r3))
This code produces the following estimates and plot.
voicing emmean lower.HPD upper.HPD
voiced 0.0212 -0.431 0.4853
voiceless -0.5393 -1.035 -0.0397
Point estimate displayed: median
HPD interval probability: 0.95
Q1- What inference can I make from these estimates and plot?
The possible answer that I have and want to check whether it is valid is:
a- the estimates and plot show that voiceless category exhibited a decreased effect compared to the voiced category.
b- The difference between the estimates of both category is not reliable due to the substantial overlap of the 95% quantile credible (or HDI) for both distributions. This means differences in the response variable x is not robust captured by the voicing predictor. (Is this a valid conclusion based on the estimate and plot?)
However, when I used the following code to plot the difference/contrast only between both voicing categories using emmeans
:
r <- emmeans(m1, ~ voicing)
pairs(r) -> r1
plot(rope(r1))
this code gives me the following estimate and plot.
contrast estimate lower.HPD upper.HPD
voiced - voiceless 0.563 0.287 0.836
Point estimate displayed: median
HPD interval probability: 0.95
As is seen, the difference between both voicing categories seems far from the rope and the 95% HDI of the posterior distribution falls outside the rope.
Q- Doesn’t this plot contradict the second conclusion I drew from the first plot and estimate (i.e., The difference between the estimates of both category is not reliable due to the substantial overlap of the 95% quantile credible (or HDI) for both distributions)?
I’m not sure which estimate and plot makes the correct inference with regard to my goal which looks at whether differences in the response variable are robustly captured by the two levels of the voicing
predictor. Am I missing something? Does each estimate and plot provide a different aspect of the model?
Your input is greatly appreciated!