Understanding Dirichlet Regression output for compositional Data

LonelyHunter2 · February 13, 2024, 1:47am

Hello everyone !
I’m fairly new to Stan (discovered R a few months ago).
I’ve collected data that includes compositional data. Meaning that one of my variables is divided into 4 sublevels that sum to 1 (or 100%) [S1 + S2 + S3 + S4 = 1 ]. They are compositional since if one goes down, another will go up.

To analyze this compositional data, I’ve been using a Dirichlet regression as suggested by Douma and Weedon (2019) [https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.13234].

However, my previous models were simple to understand. Today, I’m dealing with interaction and I’m lost.

In short, I’m doing what could be seen as a 2x2x3 design
My first factor is Feet - FA, FT (within-subject)
My second factor is FB - NoFB, GoodFB, ErroneousFB (within-subject)
My third factor is Realized - oui, non (between-subject)

My dependant variable with 4 level is Frequency Band (Med, Low, VL, UL)

My principal hypothesis is to see if there is a shift in frequency under different feedback conditions. Based on the other variable collected during the same project, I do expect to see an interaction between FB:Realized.

After doing my Dirichlet Regression, I don’t understand how to interpret the results of the interaction.

Here is my model (obtained with brms).


fit <- brm(cbind(Med, Low, VL, UL) ~ Feet + Realized + FB + (1|Participant) + (1|Trial),
           data = WaveletML, family = dirichlet())

 Family: dirichlet 
  Links: muLow = logit; muVL = logit; muUL = logit; phi = identity 
Formula: cbind(Med, Low, VL, UL) ~ Feet * Realized * FB + (1 | Participant) 
   Data: WaveletML (Number of observations: 747) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Group-Level Effects: 
~Participant (Number of levels: 25) 
                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(muLow_Intercept)     0.06      0.04     0.00     0.14 1.00     1061     1508
sd(muVL_Intercept)      0.22      0.05     0.14     0.32 1.00     1599     2537
sd(muUL_Intercept)      0.34      0.06     0.25     0.48 1.01     1230     2216

Population-Level Effects: 
                                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
muLow_Intercept                       0.87      0.10     0.68     1.06 1.00     1092     2069
muVL_Intercept                        0.51      0.11     0.28     0.73 1.00     1226     2181
muUL_Intercept                        1.07      0.12     0.83     1.33 1.01      830     1526
muLow_FeetFT                          0.27      0.14    -0.00     0.54 1.00      927     1731
muLow_Realizedoui                    -0.03      0.17    -0.35     0.31 1.00      894     1872
muLow_FBGoodFB                        0.09      0.14    -0.17     0.37 1.00      996     1814
muLow_FBNoFB                         -0.06      0.13    -0.32     0.21 1.00     1169     2071
muLow_FeetFT:Realizedoui             -0.10      0.23    -0.56     0.35 1.00      851     1693
muLow_FeetFT:FBGoodFB                -0.08      0.20    -0.46     0.30 1.00      889     1579
muLow_FeetFT:FBNoFB                  -0.08      0.19    -0.47     0.29 1.00     1029     1855
muLow_Realizedoui:FBGoodFB           -0.34      0.22    -0.77     0.09 1.00      856     1635
muLow_Realizedoui:FBNoFB             -0.15      0.22    -0.59     0.27 1.00      989     2200
muLow_FeetFT:Realizedoui:FBGoodFB     0.41      0.32    -0.22     1.02 1.00      858     1521
muLow_FeetFT:Realizedoui:FBNoFB       0.37      0.32    -0.26     1.01 1.00      978     1863
muVL_FeetFT                           0.71      0.14     0.43     0.98 1.00     1002     1947
muVL_Realizedoui                      0.00      0.19    -0.35     0.38 1.00     1005     2002
muVL_FBGoodFB                         0.12      0.14    -0.16     0.40 1.00     1086     1813
muVL_FBNoFB                          -0.16      0.14    -0.44     0.11 1.00     1210     2370
muVL_FeetFT:Realizedoui              -0.30      0.24    -0.78     0.17 1.00      892     1802
muVL_FeetFT:FBGoodFB                 -0.23      0.20    -0.61     0.17 1.00     1001     2020
muVL_FeetFT:FBNoFB                    0.06      0.20    -0.32     0.45 1.00     1173     1969
muVL_Realizedoui:FBGoodFB            -0.86      0.24    -1.34    -0.41 1.00     1012     1730
muVL_Realizedoui:FBNoFB              -0.30      0.23    -0.75     0.15 1.00     1073     1847
muVL_FeetFT:Realizedoui:FBGoodFB      0.68      0.33     0.02     1.30 1.00      969     1712
muVL_FeetFT:Realizedoui:FBNoFB        0.62      0.33    -0.03     1.26 1.00     1021     1942
muUL_FeetFT                           0.13      0.14    -0.14     0.40 1.00      860     1541
muUL_Realizedoui                      0.18      0.21    -0.21     0.61 1.00      762     1648
muUL_FBGoodFB                        -0.00      0.13    -0.25     0.25 1.00      949     1812
muUL_FBNoFB                          -0.12      0.13    -0.37     0.14 1.00     1105     2358
muUL_FeetFT:Realizedoui              -0.04      0.22    -0.48     0.39 1.00      841     1652
muUL_FeetFT:FBGoodFB                 -0.11      0.19    -0.49     0.27 1.00      913     1704
muUL_FeetFT:FBNoFB                    0.30      0.19    -0.08     0.67 1.00     1006     1986
muUL_Realizedoui:FBGoodFB            -1.07      0.22    -1.49    -0.64 1.00      909     1651
muUL_Realizedoui:FBNoFB              -0.51      0.21    -0.93    -0.12 1.00      951     2264
muUL_FeetFT:Realizedoui:FBGoodFB      0.59      0.32    -0.04     1.20 1.00      915     1720
muUL_FeetFT:Realizedoui:FBNoFB        0.51      0.31    -0.09     1.12 1.00     1025     2002

Family Specific Parameters: 
    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi    11.34      0.33    10.71    11.99 1.00     5405     3008

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

From the result, I see a significant intercept and few significant slopes as suggested by a 95%CrI that do not include zero.

                                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
muLow_Intercept                       0.87      0.10     0.68     1.06 1.00     1092     2069
muVL_Intercept                        0.51      0.11     0.28     0.73 1.00     1226     2181
muUL_Intercept                        1.07      0.12     0.83     1.33 1.01      830     1526
muVL_FeetFT                           0.71      0.14     0.43     0.98 1.00     1002     1947
muVL_Realizedoui:FBGoodFB            -0.86      0.24    -1.34    -0.41 1.00     1012     1730
muVL_FeetFT:Realizedoui:FBGoodFB      0.68      0.33     0.02     1.30 1.00      969     1712
muUL_Realizedoui:FBGoodFB            -1.07      0.22    -1.49    -0.64 1.00      909     1651
muUL_Realizedoui:FBNoFB              -0.51      0.21    -0.93    -0.12 1.00      951     2264

From my understanding of this output, the slope/intercept is compared to a reference value. In this case, the reference value is ErroneousFB-FA-non and the Med band. Thus, when I have a significant slope like :

muUL_Realizedoui:FBGoodFB            -1.07      0.22    -1.49    -0.64 1.00      909     1651

it means that the slope for that condition specifically is different from the slope from the reference.

However, although it is interesting, I feel that there is some comparison missing … like in a post-hoc or simple effect analysis through emmeans.

Here is a graph representation of the mean contribution and the 95%CI

We see through the figure that there is an interaction between FB:Realized. However, the Dirichlet regression only looks at specific predictors. For instance, I don’t know if there is an effect of GoodFB compared to NoFB (as it is always compared to ErroneousFB - the reference).

So what am I missing? Do I need to do an additional step/analysis? How can I interpret my results (and provide evidence of significant differences)? Is there a kind of post-hoc for the Dirichlet regression?

Any help would be much appreciated!
Thanks !

Bob_Carpenter · February 21, 2024, 10:06pm

Hi, @LonelyHunter2 and sorry this hasn’t gotten answered sooner. I think the forums are overwhelmed with brms questions, whereas only a handful of the Stan devs (not including me) can answer brms questions.

I don’t know what brms does by default, but Dirichlet regression is basically the same as other multivariate GLMs (like multilogit) in terms of how to interpret coefficients. The only real difference is that the output is compact in the sense of living in the bounded subset of simplexes and that the simplex structure induces correlations on answers.

Is brms encouraging people to look at these things like frequentists? What you’ll find in the world of Bayes is that covariates and their corresponding coefficients can be useful predictively even if not “significant”. The nice part about Bayesian analyses is that they integrate over the posterior uncertainty, so that uncertainly estimated quantities tend to wash out.

I have no idea what all the significance indications mean. This is going to be some kind of multiple testing problem and then if you want to be a frequentist, you’ll have to contend with the post-selection inference problem (the fact that you’re no longer fitting the same model once you use the data to select “significant” coefficients).

Setting aside all the significance calculations, you can fit the model with and without the interaction term and see what it looks like predictively using cross-validation or in-sample using posterior predictive checks. The problem is that this becomes a combinatorial mess in the face of lots of predictors.

P.S. To get more general error terms with correlation, you could move from Dirichlet regression to a multivariate logistic normal. I doubt that would be in brms, but you could code it in Stan.

Topic		Replies	Views
Using stan via brms to model compositional response brms dirichlet-multinomial	13	1370	July 3, 2020
Running regressions with compositional data (sum up to 1, leading to multicollinearity) brms	3	190	December 30, 2024
Dirichlet Regression using either the Common or Alternative Parameterization Modeling rstan , specification , brms	11	2217	April 30, 2021
Compositional Data Dirichlet Regression Question Modeling	1	128	July 15, 2024
Understanding the parameters of a hierarchical Dirichlet regression Modeling dirichlet-multinomial , brms	1	1099	May 26, 2022

Understanding Dirichlet Regression output for compositional Data

Related topics