Parameter contrasts in Bayesian linear regression model using brms

tester · November 5, 2018, 9:18am

Using brms, I have fitted 2 linear Bayesian regression models of a response variable A with each 4 binary and one real valued (standardized) parameters (i.e. A ~ b_1 + b_2 + b_3 + b_4 + r). One of the models includes a population level effect (1|user_id). The models are fitted each on a different data set, but both using a log-normal distribution family. I haven’t specified priors, i.e. non-informative default priors were used. Using brms::pp_check, the models seem to (optically) fit well to the data.

My objective is now to quantify for each of the models, how the response A changes when a certain binary variable is true/false. An example is presented here: https://bayesat.github.io/lund2018/slides/andrey_anikin_slides.pdf (from page 29), where contrasts between corpora are estimated from MCMC samples. I basically want to do the same but for each of my parameters, i.e. “What change can be expected in A when p_1 is true/false”.

My idea was to use the MCMC samples, i.e. the posterior distribution, from brms::posterior_samples. For each sample, I get estimates of the log-normal distribution parameters for intercept and the binary variable I am interested in. However, I am interested in the real value of A, which could be the mean/median/mode of the log-normal distribution, calculated from the distribution parameters. What I tried at first is to estimate the median of the log-normal distribution, i.e. med = exp(mu), for the certain binary parameter at every MCMC sample, and then plot a histogram of all the values. The distribution looks reasonable but I am not sure if taking the median of each sample is valid, or taking mean/mode would be a better choice.

How would you recommend to do that? Does this sound like a common procedure that I am not yet aware of?

I noticed a similar output when using brms::pp_check with stat type “_grouped”, and specifying the binary variable which I want to show the contrast for. The resulting plot shows what I am trying to reconstruct, however, for fitted/predicted samples.

Operating System: Ubuntu 16.04 LTS
brms Version: 2.6.0

paul.buerkner · November 5, 2018, 9:40am

The methods posterior_linpred and posterior_predict should help you automating what you want to do. Specifically, I would recommend looking at the newdata argument to specify the conditions that should be contrasted.

tester · November 8, 2018, 3:18pm

Thanks for the suggestions, I am looking into those.

Was I on the right track using the pp_check function together with the stat_grouped argument to derive the contrasts (setting the group to the factor to be contrasted)? It seems to be more straight forward for me to do it this way instead of defining the newdata argument myself. I can directly get a distribution of the contrast by removing the values from one outcome of the factor from the other outcome.

tester · February 11, 2019, 5:51pm

I would like to reopen the discussion here again.

Having retrieved the contrasts from the model responses (i.e. from the pp_check outputs), is it legitimate to do (Bayesian) NHSTs on those? I am referring to the 95% HDI + ROPE rule from Kruschke.

For example:
For parameter b_1, the contrast (between groups b_1 = TRUE and b_1 = FALSE) is 0.31 [-0.01, 0.56] (mean and 95% HDI). With any ROPE that is centered around zero, the null hypothesis cannot be rejected (either accepted or left undecided upon).

paul.buerkner · February 11, 2019, 11:40pm

That look reasonable. I wouldn’t use pp_check for the output though but perhaps rather posterior_predict or posterior_linpred or hypothesis.

tester · February 19, 2019, 4:12pm

Thank you very much for fast confirming. I am still struggling how to extract the same information using posterior_predict, meaning the same information that I get when using pp_check with the stat_grouped type argument and the argument group being set to the factor that I want to check for “significance”.

When using the newdata argument of posterior_predict it seems I cannot select only one factor and then get two columns, one for each of the two outcome (TRUE/FALSE). If I generate two distributions via posterior_predict, one for TRUE and one for FALSE, and then subtract them, I don’t get the same output as with ppcheck. Actually, I tried to understand how ppcheck is generating the posterior values to compare it with posterior_predict but did not manage to figure it out from the source code.

Is there anything wrong in my way of approaching this or can you give a pseudo example how to use posterior_predict for that purpose?

paul.buerkner · February 19, 2019, 4:14pm

Can you show me the code you are using (including the model code)?

tester · February 19, 2019, 7:15pm

Here is how far I got so far:

brm_model <- brm(data = data_model, formula = A ~ b_1 + b_2 + (1|user_id), family = lognormal())

ppc_b_1 <- pp_check(brm_model, type = "stat_grouped", group = "b_1")

# Contrast V. 1, via pp_check
contrast_b_1 <- ppc_b_1$data$value[ppc_b_1$data$group == "TRUE"] 
              - ppc_b_1$data$value[ppc_b_1$data$group == "FALSE"]

hdi(contrast_b_1, prob = 0.95)
# [1] -0.627414547  0.006199777


# Contrast V. 1, via posterior_predict
ppd_b_1_true  <- posterior_predict(brm_model, newdata = subset(data_model, b_1 == TRUE))
ppd_b_1_false <- posterior_predict(brm_model, newdata = subset(data_model, b_1 == FALSE))

I actually managed to subset on the two outcomes for the newdata argument, but I am stuck now. How can I remove the group level effect (user_id) causing all the columns for each level of user_id and get one column instead such that I can subtract those from each other to get the contrast?

paul.buerkner · February 19, 2019, 10:44pm

I think pp_check simply summarizes over the columns belonging to certain group levels. Here is the function of bayesplot that actually does the job:

bayesplot:::ppc_group_data <- function (y, yrep, group, stat = NULL)  {
    d <- data.frame(group = factor(group), y = y, yrep = t(yrep))
    colnames(d) <- gsub(".", "_", colnames(d), fixed = TRUE)
    molten_d <- reshape2::melt(d, id.vars = "group")
    molten_d <- dplyr::group_by(molten_d, .data$group, .data$variable)
    dplyr_fun <- dplyr::summarise
    if (is.null(stat)) {
        stat <- function(x) x
        dplyr_fun <- dplyr::mutate
    }
    stat <- match.fun(stat)
    dplyr_fun(molten_d, value = stat(.data$value))
}

If you want to “remove” the random effects (i.e. conditions them to be zero), you may instead go for re_formula = NA in posterior_predict and supply a data.frame with the relevant contrasts, only. This of course changes what is actually predicted, since the pp_check approaches marginalizes (to my understanding) over the random effects (and other predictors) in the data set, while the re_formula = NA approach condtions them to be zero.

tester · February 22, 2019, 5:43pm

Thank you. Do you know if there is a simple way to marginalize over the random effects using posterior_predict, i.e. do the same as pp_check is (probably) doing?

I am wondering how I can do this. Was my previous approach the right way, to use posterior_predict twice for each outcome of the effect and then subtract them from each other?

paul.buerkner · February 22, 2019, 5:50pm

pp_check simply averages over all the corresponding observations, as you see in the code I gave you above. You can apply the same function to the output of posterior_predict to get what you want (or close to it).

Running posterior_predict twice and subtracting is fine, if you use re_formula = NA or supply new levels of the grouping variables.

tester · February 26, 2019, 8:50pm

Thank you for clarifying. Indeed, I get a similar HDI of the contrast when I use posterior_predict with re_formula = NA and then simply average the columns:

ppd_b_1_true  <- posterior_predict(brm_model, newdata = data_model[data_model$b_1 == TRUE, ], re_formula = NA)
ppd_b_1_false  <- posterior_predict(brm_model, newdata = data_model[data_model$b_1 == FALSE, ], re_formula = NA)

ppd_b_1_true <- rowMeans(ppd_b_1_true)
ppd_b_1_false <- rowMeans(ppd_b_1_false)

contrast_b_1 <- ppd_b_1_true - ppd_b_1_false

hdi(contrast_b_1, prob = 0.95)
# [1] -0.63893044  0.01472799

Topic		Replies	Views
Contrasts in BRMS brms	21	5248	April 1, 2020
Contrast Analysis - Predicted Data VS Parameters brms	2	397	June 2, 2021
Approach to estimating contrasts using brms model Modeling	0	34	May 15, 2025
Computing and summarizing contrasts from brmsfit brms	2	889	November 8, 2019
Creating polynomial contrast for parameters brms interpret-results , conditional-effects	4	1074	July 30, 2020

Parameter contrasts in Bayesian linear regression model using brms

Related topics