Hello everyone,
The shortest formulation of my problem is the following:
I’m trying to design a power analysis by simulation for a Dirichlet regression model using brms
. How could we calculate the shape parameter alpha of the Dirichlet distribution by hand from a theoretical model in order to feed it to brms::rdirichlet(n, alpha)
to simulate data with chosen effects? The goal is to tweak manually the coefficients of the model to simulate data with varying effect sizes.
Now for the lengthier context and what I’ve tried:
The outcome I want to predict is a composition Y of four emotions (anger, disgust, sadness, neutral). I have two categorical predictors with two levels called Domain (Harm, Purity) and Focus (Self, Other). Thus, the model is Y ~ Domain + Focus + Domain:Focus
. If my understanding of this vignette is correct (I suspect it isn’t), Y_i = exp(alpha_i) / sum(exp(alpha_k) for the mean proportion of emotion i, alpha being the shape parameter of the Dirichlet distribution.
I build some toy data by hand and fitted the model on it, then tried to make sense of the coefficients. Here is the model:
bind <- function(...) cbind(...)
model <-
brm(
formula = bind(anger, disgust, sadness, neutral) ~ domain * focus,
data = df,
family = dirichlet()
)
summary(model)
Family: dirichlet
Links: mudisgust = logit; musadness = logit; muneutral = logit; phi = identity
Formula: y ~ domain * focus
Data: df (Number of observations: 48)
Draws: 7 chains, each with iter = 6000; warmup = 2000; thin = 1;
total post-warmup draws = 28000
Regression Coefficients: [see down below]
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi 4.60 0.65 3.41 5.97 1.00 22872 19779
Here are the regression coefficients with a bit of formatting on my end:
I know that given a brmsfit
model, I can just use predict()
(or any equivalent) to retrieve the composition I was aiming for above, as advised in several interesting threads like here or here. Here is a plot of these predicted values for this toy model, which fit nicely with the original data (the plot is custom, but it is virtually the same as brms::conditional_effects
):
If my understanding is correct, when domain is Harm (0 dummy coded), proportions for (anger, disgust, sadness, neutral) should be somewhere around (0.6, 0.3, 0.1, 0), and when it is Purity (1 dummy coded), they should be around (0.1, 0.7, 0.2, 0).
However, when I try to retrieve approximate proportions using the link formula, I can’t seem to get these values, especially because the Purity coefficient for Sadness is higher than Disgust. I wrote a function that I thought to be able to compute the composition based on the coefficients:
retrieve_composition <- function(domain, focus){
# the beta coefficients from the summary
beta_0 <- c(0, -0.54, -1.1, -2.96)
beta_1 <- c(0, 3.13, 4.79, 2.95)
beta_2 <- c(0, -0.12, -0.99, -0.06)
beta_12 = c(0, -0.4, -1.71, -0.61)
alpha = c(
beta_0[1] + beta_1[1]*domain + beta_2[1]*focus + beta_12[1]*domain*focus,
beta_0[2] + beta_1[2]*domain + beta_2[2]*focus + beta_12[2]*domain*focus,
beta_0[3] + beta_1[3]*domain + beta_2[3]*focus + beta_12[3]*domain*focus,
beta_0[4] + beta_1[4]*domain + beta_2[4]*focus + beta_12[4]*domain*focus
)
composition = exp(alpha)/sum(exp(alpha))
return(composition)
}
It looks ok for the intercept, as retrieve_composition(0, 0)
gives (.51, .30, .17, .03). But for the Purity condition, i.e. retrieve_composition(1, 0)
, I get (.02, .24, .72, .02), when it should be (0.15, 0.7, 0.1, 0)… Which is logical, because beta_1[2]
is lower than beta_1[3]
. However, these are the brms
model coefficients, and the model fits and predicts the data correctly…
I can’t figure out how to retrieve the composition for the Purity condition. I feel like I’ve made an obvious misinterpretation due to my lack of mathematical literacy, but I don’t understand where… I’ve run out of ideas. What have I done wrong?
Sorry for the lengthy post, and thanks a lot for your help!
- Operating System: Windows 10
- brms Version: 2.21.0