Dirichlet regressioni coefficients

Hello everyone, starting with this example of Dirichlet regression here.
My variable y is a vector of N = 3 elements and the Dirichlet regression model estimates N-1 coeff.
Let’s say I am interested in all 3 coefficients, how can I get them?
Thanks!


library(brms)
library(rstan)
library(dplyr)
bind <- function(...) cbind(...)

N <- 20
df <- data.frame(
  y1 = rbinom(N, 10, 0.5), y2 = rbinom(N, 10, 0.7), 
  y3 = rbinom(N, 10, 0.9), x = rnorm(N)
) %>%
  mutate(
    size = y1 + y2 + y3,
    y1 = y1 / size,
    y2 = y2 / size,
    y3 = y3 / size
  )
df$y <- with(df, cbind(y1, y2, y3))

make_stancode(bind(y1, y2, y3) ~ x, df, dirichlet())
make_standata(bind(y1, y2, y3) ~ x, df, dirichlet())

fit <- brm(bind(y1, y2, y3) ~ x, df, dirichlet())

summary(fit)

 Family: dirichlet 
  Links: muy2 = logit; muy3 = logit; phi = identity 
Formula: bind(y1, y2, y3) ~ x 
   Data: df (Number of observations: 20) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
muy2_Intercept     0.29      0.10     0.10     0.47 1.00     2830     2514
muy3_Intercept     0.56      0.09     0.38     0.73 1.00     2833     2623
muy2_x             0.04      0.11    -0.17     0.24 1.00     3265     2890
muy3_x            -0.00      0.10    -0.20     0.19 1.00     3229     2973

Family Specific Parameters: 
    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi    39.85      9.13    23.83    59.78 1.00     3358     2652

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
1 Like

Hi Andrew,

to see how this works try the following

#see vignette "brms_families", section probability models
vignette('brms_families')
#see ?rdirichlet
?brms::rdirichlet
# alpha is a concentration parameter, this is the same as nu in the vignette
# the higher the values in alpha the greater the "certainty"
# i.e. c(0.1, 0.1, 0.1) is less certain than c(100, 100, 100) but has the same relative composition: 0.33, 0.33, 0.33

# create a composition
composition <- c(a = 0.2, b = 0.5, c = 0.3) # sums to 1
certainty <- 10 # the higher the more certain, try different values
alpha <- composition * certainty # alpha for brms::rdirichlet 
alpha

# draw from distribution
set.seed(21)
rd <- rdirichlet(n = 100, alpha) # you get better estimates with higher n, try different values
colnames(rd) <- names(alpha)

# create a data frame for brms
df <- tibble(n = 1:nrow(rd))
df$y <- rd

# fit an intercept only model (just to illustrate the point)
fit <- brm(y ~ 1, df, dirichlet())
summary(fit)
fixef(fit)

# Extract the fixed effects 
# a = 0 for identifiability reasons
fix <-c(mua_Intercept = 0, fixef(fit)[, 'Estimate'])

# use back-transformation to extract the estimated composition
# see vignette
transformed <- fix |> exp()
(transformed / sum(transformed)) |> round(2)
# compare
composition

#alternatively just predict
predict(fit, newdata = 1)

In cases where you have covariates, assess the effects by looking at the logit coefficients.
To see effects in original values just predict.

I hope this helped.

Cheers

Jannik

1 Like

Dear Jannik,

that’s great and really helpful.
If and only if you have additional time, could you provide me with an example including both, fix and random effects covariates at the same time, with particular regard when you make a prediction, this is the tricky part with random effects.

# Extract the fixed effects 
# a = 0 for identifiability reasons
fix <-c(mua_Intercept = 0, fixef(fit)[, 'Estimate'])
fix
# use back-transformation to extract the estimated composition
# see vignette
transformed <- fix |> exp()
(transformed / sum(transformed)) |> round(2)
# compare`

Thanks for your time,
Andrew

Referring to your code, I extended the script by inserting a trivial example of “x” covariate and “x_rand” random effects.

I have tried to get predictions and compare them with brms’ predict command, am I very far away?

Last note, curiously it seems that trying to change the link functions they don’t work
“dirichlet (link = “logit”, link_phi = “log”, refcat = NULL)”

#see vignette "brms_families", section probability models
vignette('brms_families')
#see ?rdirichlet
?brms::rdirichlet
# alpha is a concentration parameter, this is the same as nu in the vignette
# the higher the values in alpha the greater the "certainty"
# i.e. c(0.1, 0.1, 0.1) is less certain than c(100, 100, 100) but has the same relative composition: 0.33, 0.33, 0.33

# create a composition
composition <- c(a = 0.2, b = 0.5, c = 0.3) # sums to 1
certainty <- 10 # the higher the more certain, try different values
alpha <- composition * certainty # alpha for brms::rdirichlet 
alpha
#install.packages("tibble")
library(tibble)
library(brms)
# draw from distribution
set.seed(21)
rd <- rdirichlet(n = 100, alpha) # you get better estimates with higher n, try different values
colnames(rd) <- names(alpha)


# create a data frame for brms
df <- tibble(n = 1:nrow(rd))
df$y <- rd

# simple covariate
df$x <- rnorm(100)

# area covariate for random effect
df$x_rand <- rep(1:5,20)
df$x_rand <- as.factor(df$x_rand)

fit <-brm(y ~ 1+x+ (1|x_rand), df, dirichlet())
summary(fit)
fx <- fixef(fit)
rn <- ranef(fit)

# get eta
 
etab <- fx[1]+fx[3]*df$x+rn$x_rand[c(1,2,3,4,5)]
etac <- fx[2]+fx[4]*df$x+rn$x_rand[c(21,22,23,24,25)]

# use transformation
exp(etab)/ exp(1+etab+etac)

# get prediction using brms
predict(fit)