Category specific co-variate effects in Dirichlet regression

HugoBotha · February 10, 2020, 9:11pm

Please also provide the following information in addition to your question:

Operating System: Win 10
brms Version: 2.10

Hi.

In the case of categorical models, multinomial models, Dirichlet regression, etc all the coefficients in the posterior are ‘relative’ to the reference category. This makes getting a sense of covariate effect strength for each category rather hard to wrap my head around. Marginal_effects / Conditional_effects makes plotting the association easy, but at times I want to directly compare the effects like one would with a simpler regression problem.

Consider the following example:

bind <- function(...) cbind(...)
set.seed(10101)
N <- 20
df <- data.frame(
  y1 = rbinom(N, 10, 0.5), y2 = rbinom(N, 10, 0.7), 
  y3 = rbinom(N, 10, 0.9), x = rnorm(N)
) %>%
  mutate(
size = y1 + y2 + y3,
y1 = y1 / size,
y2 = y2 / size,
y3 = y3 / size
  )
df$y <- with(df, cbind(y1, y2, y3))

fit<-brm(y~x,df,dirichlet(refcat = "y2"),cores=4)

summary(fit)

 Family: dirichlet 
  Links: muy1 = logit; muy3 = logit; phi = identity 
Formula: y ~ x 
   Data: df (Number of observations: 20) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Population-Level Effects: 
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
muy1_Intercept    -0.28      0.08    -0.44    -0.13 1.00     3793     3180
muy3_Intercept     0.22      0.07     0.08     0.35 1.00     3866     2975
muy1_x            -0.02      0.08    -0.17     0.13 1.00     3557     3089
muy3_x            -0.04      0.07    -0.18     0.09 1.00     3555     2852

I would like to compare the effect of covariate x on the probability of y1,y2 and y3.

Currently my workaround for doing so is to use the fitted() function with newdata consisting of the min() and max() of the covariate, then manually calculating slopes:

fitted_newdata<-fitted(fit,newdata = newdata,summary = F,nsamples = 1000)
dimnames(fitted_newdata)<-list("Sample"=sprintf("s%d",1:1000),"x"=c("x1","x2"),"Category"=dimnames(fitted_newdata)[[3]])

fitted_newdata %>% as_tibble() %>%
  mutate(.sample=1:1000) %>%
  gather(Var,Val,-.sample) %>% 
  separate(Var,sep = ".y",into = c("x","Category")) %>%
  spread(x,Val) %>%
  mutate(slope=(x2-x1)/(max(df$x)-min(df$x))) %>%
  compare_levels(by = Category,slope,draw_indices = ".sample")%>%
  ggplot(aes(y=as.factor(Category),x=slope))+geom_halfeyeh()+ylab("Comparison")+xlab("Slope for x")

Questions

Is my fitted() way a complete abomination or is this valid?
If indeed it is, is there a better way to compare the effect size/betas for the covariates INCLUDING for the reference category?

Thanks, as always

Hugo

HugoBotha · February 19, 2020, 11:35pm

In case the length of the post is a barrier, a TLDR of sorts would be:

In models with a reference level, is it valid to use fitted() as follows:

Create newdata, setting a low and a high value for a numeric variable of interest and everything else at their mean/reference level
Using the resulting fitted values, manually calculate the slope (change in outcome over change in variable of interest)
Compare the slopes between categories, or summarize them as one would any other variable (tidybayes etc)

I feel this is what conditional_effects() does, and so I think it’s a valid procedure. But I would appreciate some input since I couldn’t find examples of people doing it. :)

paul.buerkner · February 20, 2020, 9:33am

Do you compute the slopes on the latent scale or on the “probability” scale? Presumably your approach is reasonable, but keep in mind on which scale you assume the effect to be linear.

Alternatively, you could also model all response categories at the same time by using refcat = NA in the family functions. This will complicate model fitting and you need to specify informative priors though.

HugoBotha · February 20, 2020, 3:52pm

Thanks Paul.

That’s something I missed. As I understand it, with fitted() on the “response” scale, I get values for all categories including the reference. But this is on the probability scale where the effect is not linear? With the scale set to “linear” I only get eta for the non-reference categories, so that puts me back at square one.

I’ve used refcat=NA for simpler models but have struggled with more complex one.

Given that conditional_effects() reports slopes/effects for all categories, there must be a valid way to get at this information though?

Or is it the case that for Dirichlet the effect is linear on the probability scale (conditional_effects with spaghetti draws straight lines) and this is only really a concern with categorical/multinomial models?

I’ve thoroughly confused myself now.

paul.buerkner · March 29, 2020, 9:56am

Eta indeed exists only for the non-reference categories and there is no way to obtain an eta for the reference category other than refcat = NA.

conditional_effects shows you effects on all categories only because of using the “response” scale.

Topic		Replies	Views
Extracting Categorical Group Effects from mixed effect Dirichlet regression brms dirichlet-multinomial , brms	0	782	July 5, 2022
Conditional_effects() plots from brms model has a response category with consistent 0 probability (flat plot) General rstan , multinomial-response , conditional-effects , brms	1	133	July 23, 2024
Correlated random effects in categorical model? brms	6	2809	January 31, 2019
How do I get "marginal_effects" for categorical variables to condition on an average rather than a category? brms	9	5985	November 15, 2018
Conditional_effects: level over other fixed effects brms	1	889	March 9, 2020

Category specific co-variate effects in Dirichlet regression

Related topics