Weights in brm

Dear Stan community,

I am using the weight option in the brm function to account for different variances in field sites in a negative binomial generalized linear mixed effect model. Using weights this way is to my understanding approximating the idea of the weighting function varIdent() from the R package nlme (as a reference may serve Galecki & Burzykowski 2013, pages 129-130, paragraph 7.3.2, page 135, paragraph 7.4.3). After considering the Stan code from the brm model, I am still trying to make up my mind, how weights are used for the posteriors. The brm code for weights in R looks as following:


b.mod <- brm(y|weights(w) ~ ...)

On the one hand, the Stan code from the brm model shows that the posteriors are simply multiplied with weights:


vector[N] weights; \\ model weights
target += weights[n] * neg_binommial_2_log_lpmf(Y[n] | mu[n], shape);

On the other hand, other R packages seem to implement weights sometimes differently. For example, the gls() function from the nlme package appears to use internally multiplicative inverse of weights, while taking as an input and showing also in the summary output “raw” weights. Again, the function lm() also differs in the implementation of weights when compared to gls(). So, the way of implementing weights in models can differ from R package to R package, which confused me as to how weights are implemented in the package brms (better to say in Stan as brm relies on Stan).

Does anybody know how brm() uses weights? Does it use internally the multiplicative inverse, which would imply that the user needs to use the “raw” form of weights (like the weight function in gls())? Or does the weights are not “transformed”, which is why the user should implement the multiplicative inverse a priori coding the model? Many thanks for time and response.

  • Operating System: ArchLinux, Linux Kernel 4.16.8-1
  • brms Version: 2.3.0

It is not the “posteriors” that are weighted but the likelihood contributions of each observation.

brms takes the weights literally, which means that an observation with weight 2 receives 2 times more weight than an observation with weight 1. It also means that using a weight of 2 is equivalent to adding the corresponding observation twice to the data frame.

2 Likes

In addition to Paul’s reply. If you want to have varIdent like behaviour, you can model the precision or variance parameters using the distributional regression approach. The first example in that vignette:

fit1 <- brm(bf(symptom_post ~ group, sigma ~ group), 
            data = dat1, family = gaussian())

Should be similar to (if my understanding is correct…):

library(nlme)
fit_gls <- gls(symptom_post ~ group, weights = varIdent(form = ~1|group), 
              data = dat1)
1 Like

Dear Paul, Dear Hans,
many thanks for your replies and hints.

@paul.buerkner

It is not the “posteriors” that are weighted but the likelihood contributions of each observation.

Shame on me. The code from Stan I’ve written in my initial post shows clearly what you are saying.

@hansvancalster

Your comment came in the right moment, 'cause I started to fiddle around with the Stan code to account for groupwise differences in variance following comments on Stan user mailing list and Stan development mailing list, i.e. some idea like


neg_binommial_2_log_lpmf(Y[n] | mu[weight[n]], shape);

The link and code you have provided in your post directed me towards the (hopefully) right path. At the end of Paul’s vignette, you gave the link for, there is an example of additive distributional models using multilevel data. While I am using data described well by a negative binomial distribution, I had to replace sigma by shape to make the model run (see also Paul’s description of brmsformula line 19-25).

Just as curiosity and point of general discussion,

Coming from a field (genomics/biology) where weights are abused, I always thought that the beauty of Bayesian inference is that “weights” is a piece of information that has to be sought from the data, with proper modelling.

How this ad hoc imposition of the information content of a data point is compatible to Bayesian inference?

Could you provide an example where this procedure is needed/better than pure Bayesian inference?

Thanks

I haven’t looked at the particulars of the model in this thread, but as a general comment, yes weighting can be incompatible with the idea of a generative model, which is a lot of the reason why many Bayesians object to it.

1 Like

Are the weights then also automatically incorporated into posterior predictions? For example, if I run a model with weights, when I do a posterior estimate for example using posterior_epred(), will the weights be already factored into the predictions that are made - because they are fully part of the model used to make predictions? Or do I need to tell posterior_epred() to use the weights again in some way?

The weights are a property of an actually observed value. As such, weights play no role in posterior_predict or posterior_epred because they predict new responses. However, weights are used in log_lik because it evaluates the likelihood on actually observed responses.

Would this include if I ran posterior_epred but feeding in the data that I used to make the model?

No, because conceptually weights do not make sense for posterior_epred.

Thanks @paul.buerkner , think I need to get into the weeds with it a bit then to understand what the deal is with the different things in relation to weights

I think a useful way to think about weights is that an observation with integer weight N acts exactly as if you had N copies of the same observation in your dataset. Non-integer weights then further interpolate between the integer values. Since it doesn’t make much sense for posterior predictions to take into account how many copies of the same observation you have in your data, it also doesn’t make much sense to include weights
. However if you are trying to compute some expectation values reflecting the original dataset (e.g. mean response) it might make sense to take a weighted average of the per observation predictions to match the expectation you would get if you had multiple copies in the data.

Hope that clarifies more than confuses.

2 Likes

Thanks Martin - that does make sense. I think the confusion here is more a matter or terminology in what I mean when I say ‘does posterior_epred’ incorporate the weights. I’ve just got a simple example here to show what I mean, and in terms of what I intended the question to mean, I think the answer is yes:

library(tidyverse)
library(brms)

data <- tibble(score = c(rnorm(1000, 0, 1), c(rnorm(1000, 1, 1))),
               weights = c(rep(1, 1000), rep(2, 1000)))

formula <- score | weights(weights) ~ 1

fit_weights <- brm(formula = formula,
               family = gaussian(),
               data = data,
               control = list(adapt_delta = 0.95, max_treedepth  = 10),
               chains = 1,
               cores = 1,
               iter = 700,
               warmup = 200)

data2 <- tibble(score = c(rnorm(1000, 0, 1), c(rnorm(2000, 1, 1))))

formula2 <- score ~ 1

fit_weights2 <- brm(formula = formula2,
                   family = gaussian(),
                   data = data2,
                   control = list(adapt_delta = 0.95, max_treedepth  = 10),
                   chains = 1,
                   cores = 1,
                   iter = 700,
                   warmup = 200)

data3 <- tibble(score = c(rnorm(1000, 0, 1), c(rnorm(1000, 1, 1))))

formula2 <- score ~ 1

fit_weights3 <- brm(formula = formula2,
                    family = gaussian(),
                    data = data3,
                    control = list(adapt_delta = 0.95, max_treedepth  = 10),
                    chains = 1,
                    cores = 1,
                    iter = 700,
                    warmup = 200)

test1 <- posterior_epred(fit_weights, ndraws = 500)
test2 <- posterior_epred(fit_weights2, ndraws = 500)
test3 <- posterior_epred(fit_weights3, ndraws = 500)

mean(test1[ , ])
mean(test2[ , ])
mean(test3[ , ])

For model 1 I run a regression that is weighted so that certain respondents are weighted double the other respondents, and these people have higher scores. In model 2, I run the model simply with 2x the number of people in the weighted group, rather than weighting them. In model 3 I run it without weights or doubling the number of people. Then I run posterior epred on respondents from each of the models. Model 1 and Model 2 give the same posterior predictions, showing how the weights are incorporated into the posterior prediction of model 1. This is all that I meant. I think what you are suggesting Martin is essentially poststratification using the weights, which would work if I ran the model as usual but requested that the model include a term to give different scores to subgroups of individuals, whereas in the case of using weights I am requesting a single grand mean weighted by certain respondents. Not sure if what I’ve said has just added more confusion but it answers my original question!

3 Likes