How to compare regression coefficients in different group?

Hello everyone. I have recently come across a query in analyzing Bayesian regression. My aim is to compare whether the regression coefficients differ in two sample populations (subgroup analysis). Having obtained the path coefficients for each of the two populations using rstan, my question is, what statistical method should I use to compare whether the two coefficients possess a statistical difference?

Is it possible to extract the posterior distribution of this coefficient from two groups directly using the “extract” function and then construct the coefficient 1-coefficient 2 difference and confidence interval directly?

I don’t know how to express it in brms, but one way to do this is to fit a joint model for both groups and parameterize the difference between the coefficients. You can do this symmetrically and set \alpha_1 = \beta and \alpha_2 = \beta + \gamma. Then the posterior on the difference is just the marginal posterior for \gamma.

I’m not sure what you mean by statistical difference. You can’t look at the probability that \gamma \neq 0, because it’s zero in a continuous density. If you want to formulate hypothesis tests, you need to rethink this in a frequentist setting and probably use tools other than Stan to fit.

Otherwise, one Bayesian approach to understanding the difference is the posterior of \gamma. As I just said in another post, you can then compose this with some kind of decision process to make decisions, but I wouldn’t recommend reducing it to a different/not different result.

1 Like

Here’s a simple example to show what I mean, I ran a regression through brms using the first dataset (representing the male sample) and subsequently ran a second regression using the second dataset (representing the female sample). At this point what should I use to compare whether the magnitudes of the coefficients of these two regressions are significantly different?

It all depends on what you mean by “significantly” here. Technically, significance is a frequentist notion measuring how extreme the data is versus a null hypothesis—the further the data is into the tail, the lower the p-value. For example, your null hypothesis here would probably be that the two conditions (male and female) are the same. I’m not 100% sure how you’d proceed from there analytically—I think you’d fit a joint model and then look at some statistic of that.

In some cases, you can use a Bayesian probability estimate. For example, you might have a hypothesis that the women’s value is higher than the men’s, at which point you can estimate a single model with a \theta_1 and \theta_2 for men and women and estimate the probability \Pr\!\left[\theta^\text{female} > \theta^\text{male} \mid y\right], where y is your combined data. But you don’t want to look at \Pr\!\left[\theta^\text{female} = \theta^\text{male} \mid y\right], because that is always zero. The reason is that in the 2D parameter space, the inequality picks out an area whereas the equality picks out a line.

If the two models aren’t related in any way, you can fit them separately and evaluate the probability using independent draws from each.

It’s better to fit both groups with a single model and have a parameter in the model which is the difference between groups. This is quite easy with brms.

You can see an example of how to do this in our code here.

1 Like

Thanks for your reply, I read the case and it should be using dose in these sections to predict the individual parameters.
f_drug_omega_only ← bf(
prop_lott | trials(total) ~ inv_logit(omega1) *
inv_logit(
(lottery_prob * (lottery_norm) ^ exp(logrho) -
(sb_norm) ^ exp(logrho)) / exp(noise)
) +
(1-inv_logit(omega1)) * inv_logit(omega2),
logrho ~ 1 + (1 | subjid),
noise ~ dose + (1 | subjid),
omega1 ~ dose + (1 | subjid),
omega2 ~ dose + (1 | subjid),
nl = TRUE)
I’m not sure I’m interpreting this correctly. In my example, when I want to clarify whether the male and female groups are significantly different in their regression coefficients, I can do so by combining the two sets of data and then constructing an interaction term between gender and the original regression coefficients with gender as a new variable, and then determining whether or not there is a significant difference in the regression coefficients between the two groups by looking at the confidence intervals for the coefficients of the interaction terms. Or should I re-predict the coefficients in terms of gender as in your example.

blme_model4 <- brm(aq ~ c1 * c2 + (1 + c1 * c2 | Group) + (1 + c1 * c2 |Group2),
                   data = MALE,
                   cores = 8, 
                   chains = 4,
                   iter=5000,
                   control = list(adapt_delta = 0.999,max_treedepth = 20)) 
blme_model5 <- brm(aq ~ c1 * c2 + (1 + c1 * c2 | Group) + (1 + c1 * c2 |Group2),
                   data = FEMALE,
                   cores = 8, 
                   chains = 4,
                   iter=5000,
                   control = list(adapt_delta = 0.999,max_treedepth = 20)) 

I am trying to figure out these coefficients in different data are different or not. Can i use 95%CI?

You should have

c1c2gender

As a fixed effect in the model.

Add a gender column to the combined data frame and then inspect if the gender coefficients/interactions are shifted from 0. You can use brms.hypothesis to see if the gender terms are significant.

Also you can fit a reduced model without the gender term on the combined data to see if adding gender increases the likelihood of your data. I would use loo for this model comparison.

1 Like