Hello everyone, starting with this example of Dirichlet regression here.
My variable y is a vector of N = 3 elements and the Dirichlet regression model estimates N-1 coeff.
Let’s say I am interested in all 3 coefficients, how can I get them?
Thanks!
library(brms)
library(rstan)
library(dplyr)
bind <- function(...) cbind(...)
N <- 20
df <- data.frame(
y1 = rbinom(N, 10, 0.5), y2 = rbinom(N, 10, 0.7),
y3 = rbinom(N, 10, 0.9), x = rnorm(N)
) %>%
mutate(
size = y1 + y2 + y3,
y1 = y1 / size,
y2 = y2 / size,
y3 = y3 / size
)
df$y <- with(df, cbind(y1, y2, y3))
make_stancode(bind(y1, y2, y3) ~ x, df, dirichlet())
make_standata(bind(y1, y2, y3) ~ x, df, dirichlet())
fit <- brm(bind(y1, y2, y3) ~ x, df, dirichlet())
summary(fit)
Family: dirichlet
Links: muy2 = logit; muy3 = logit; phi = identity
Formula: bind(y1, y2, y3) ~ x
Data: df (Number of observations: 20)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
muy2_Intercept 0.29 0.10 0.10 0.47 1.00 2830 2514
muy3_Intercept 0.56 0.09 0.38 0.73 1.00 2833 2623
muy2_x 0.04 0.11 -0.17 0.24 1.00 3265 2890
muy3_x -0.00 0.10 -0.20 0.19 1.00 3229 2973
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi 39.85 9.13 23.83 59.78 1.00 3358 2652
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
#see vignette "brms_families", section probability models
vignette('brms_families')
#see ?rdirichlet
?brms::rdirichlet
# alpha is a concentration parameter, this is the same as nu in the vignette
# the higher the values in alpha the greater the "certainty"
# i.e. c(0.1, 0.1, 0.1) is less certain than c(100, 100, 100) but has the same relative composition: 0.33, 0.33, 0.33
# create a composition
composition <- c(a = 0.2, b = 0.5, c = 0.3) # sums to 1
certainty <- 10 # the higher the more certain, try different values
alpha <- composition * certainty # alpha for brms::rdirichlet
alpha
# draw from distribution
set.seed(21)
rd <- rdirichlet(n = 100, alpha) # you get better estimates with higher n, try different values
colnames(rd) <- names(alpha)
# create a data frame for brms
df <- tibble(n = 1:nrow(rd))
df$y <- rd
# fit an intercept only model (just to illustrate the point)
fit <- brm(y ~ 1, df, dirichlet())
summary(fit)
fixef(fit)
# Extract the fixed effects
# a = 0 for identifiability reasons
fix <-c(mua_Intercept = 0, fixef(fit)[, 'Estimate'])
# use back-transformation to extract the estimated composition
# see vignette
transformed <- fix |> exp()
(transformed / sum(transformed)) |> round(2)
# compare
composition
#alternatively just predict
predict(fit, newdata = 1)
In cases where you have covariates, assess the effects by looking at the logit coefficients.
To see effects in original values just predict.
that’s great and really helpful.
If and only if you have additional time, could you provide me with an example including both, fix and random effects covariates at the same time, with particular regard when you make a prediction, this is the tricky part with random effects.
# Extract the fixed effects
# a = 0 for identifiability reasons
fix <-c(mua_Intercept = 0, fixef(fit)[, 'Estimate'])
fix
# use back-transformation to extract the estimated composition
# see vignette
transformed <- fix |> exp()
(transformed / sum(transformed)) |> round(2)
# compare`
Referring to your code, I extended the script by inserting a trivial example of “x” covariate and “x_rand” random effects.
I have tried to get predictions and compare them with brms’ predict command, am I very far away?
Last note, curiously it seems that trying to change the link functions they don’t work
“dirichlet (link = “logit”, link_phi = “log”, refcat = NULL)”
#see vignette "brms_families", section probability models
vignette('brms_families')
#see ?rdirichlet
?brms::rdirichlet
# alpha is a concentration parameter, this is the same as nu in the vignette
# the higher the values in alpha the greater the "certainty"
# i.e. c(0.1, 0.1, 0.1) is less certain than c(100, 100, 100) but has the same relative composition: 0.33, 0.33, 0.33
# create a composition
composition <- c(a = 0.2, b = 0.5, c = 0.3) # sums to 1
certainty <- 10 # the higher the more certain, try different values
alpha <- composition * certainty # alpha for brms::rdirichlet
alpha
#install.packages("tibble")
library(tibble)
library(brms)
# draw from distribution
set.seed(21)
rd <- rdirichlet(n = 100, alpha) # you get better estimates with higher n, try different values
colnames(rd) <- names(alpha)
# create a data frame for brms
df <- tibble(n = 1:nrow(rd))
df$y <- rd
# simple covariate
df$x <- rnorm(100)
# area covariate for random effect
df$x_rand <- rep(1:5,20)
df$x_rand <- as.factor(df$x_rand)
fit <-brm(y ~ 1+x+ (1|x_rand), df, dirichlet())
summary(fit)
fx <- fixef(fit)
rn <- ranef(fit)
# get eta
etab <- fx[1]+fx[3]*df$x+rn$x_rand[c(1,2,3,4,5)]
etac <- fx[2]+fx[4]*df$x+rn$x_rand[c(21,22,23,24,25)]
# use transformation
exp(etab)/ exp(1+etab+etac)
# get prediction using brms
predict(fit)