Dirichlet Regression using either the Common or Alternative Parameterization

I don’t think I understand exactly what you mean, but my hunch is that this is not correct.

To define a Dirichlet distribution in K dimensions, we need the precision (phi) and a K-simplex (set of K numbers between 0 and 1 that sum to 1). A K-simplex is completely determined by K-1 values. Since we want our linear predictors to be unbounded, brms (and other packages) uses the softmax function to map from an unbounded K-dimensional vector to a K-simplex. The softmax function for a vector y is:

\text{softmax}(y) = \frac{\exp(y)} {\sum_{k=1}^K \exp(y_k)}

I.e. all elements are exponentiated and then the vector is divided by the sum of the exponentiated values to enforce that sum is 1. You’ll note that from the properties of the exponential, if you add the same number to all elements of the vector y, you get the same result, i.e. if c is a single constant (not a vector) \text{softmax}(y + c) = \text{softmax}(y). So even if we restrict the first element to be 0 (as brms does) we have not “lost” any possible result of the softmax.

In the full model, we want to construct a (possibly different) Dirichlet distribution for each combination of our covariates (e.g. Day_Period = 3, Ave_Herd = TRUE, Individual_ID = 338, Session = 5 is a combination of covariates). Since phi is assumed fixed, we need to somehow “predict” only K-1 values. Those values can be interpreted as the mean proportion relative to the first category on a logit scale. The first category here is treated as a fixed point and is not estimated for any covariate combination. Once again this is similar to the binary case - when I estimate an increase in category 0, it implies a decrease in category 1. In 3-dimensional case, an increase in predictor for one category means that this category will be more prevalent relative to the reference category, but if the predictor for the final category increases even more, than in fact the absolute prevalence of the second category can drop (since the absolute prevalence of the refernce category drops even more).

What I would expect you see on the conditional_effects plot are those relative effects that are super hard to interpret on their own. To given an example how this could look like with the formula bf(bind(Traveling, Vigilant, Other) ~ Day_Period, family = "dirichlet")

muVigilant_Intercept <- 3
muVigilant_DayPeriod2 <- 1
muOther_Intercept <- -1
muOther_DayPeriod2 <- 5

# Predict the mean vector for case DayPeriod = 1
linpred_DayPeriod1 <- c(0, muVigilant_Intercept, muOther_Intercept)
linpred_DayPeriod1
#> [1]  0  3 -1
brms:::softmax(linpred_DayPeriod1)
#>            [,1]      [,2]       [,3]
#> [1,] 0.04661262 0.9362396 0.01714783

# Predict the mean vector for case DayPeriod = 1
linpred_DayPeriod2 <- c(0, muVigilant_Intercept + muVigilant_DayPeriod2, 
                        muOther_Intercept + muOther_DayPeriod2)
linpred_DayPeriod2
#> [1] 0 4 4
brms:::softmax(linpred_DayPeriod2)
#>             [,1]      [,2]      [,3]
#> [1,] 0.009074715 0.4954626 0.4954626

So despite the coefficient for muVigilant_DayPeriod2 being positive, the actual proportion of the the Vigilant state decreased. (but the relative proportion to Traveling increased).

I am not sure I can explain this much better, so hope this clarifies more than confuses :-)

1 Like