Thanks Martin - that does make sense. I think the confusion here is more a matter or terminology in what I mean when I say ‘does posterior_epred’ incorporate the weights. I’ve just got a simple example here to show what I mean, and in terms of what I intended the question to mean, I think the answer is yes:
library(tidyverse)
library(brms)
data <- tibble(score = c(rnorm(1000, 0, 1), c(rnorm(1000, 1, 1))),
weights = c(rep(1, 1000), rep(2, 1000)))
formula <- score | weights(weights) ~ 1
fit_weights <- brm(formula = formula,
family = gaussian(),
data = data,
control = list(adapt_delta = 0.95, max_treedepth = 10),
chains = 1,
cores = 1,
iter = 700,
warmup = 200)
data2 <- tibble(score = c(rnorm(1000, 0, 1), c(rnorm(2000, 1, 1))))
formula2 <- score ~ 1
fit_weights2 <- brm(formula = formula2,
family = gaussian(),
data = data2,
control = list(adapt_delta = 0.95, max_treedepth = 10),
chains = 1,
cores = 1,
iter = 700,
warmup = 200)
data3 <- tibble(score = c(rnorm(1000, 0, 1), c(rnorm(1000, 1, 1))))
formula2 <- score ~ 1
fit_weights3 <- brm(formula = formula2,
family = gaussian(),
data = data3,
control = list(adapt_delta = 0.95, max_treedepth = 10),
chains = 1,
cores = 1,
iter = 700,
warmup = 200)
test1 <- posterior_epred(fit_weights, ndraws = 500)
test2 <- posterior_epred(fit_weights2, ndraws = 500)
test3 <- posterior_epred(fit_weights3, ndraws = 500)
mean(test1[ , ])
mean(test2[ , ])
mean(test3[ , ])
For model 1 I run a regression that is weighted so that certain respondents are weighted double the other respondents, and these people have higher scores. In model 2, I run the model simply with 2x the number of people in the weighted group, rather than weighting them. In model 3 I run it without weights or doubling the number of people. Then I run posterior epred on respondents from each of the models. Model 1 and Model 2 give the same posterior predictions, showing how the weights are incorporated into the posterior prediction of model 1. This is all that I meant. I think what you are suggesting Martin is essentially poststratification using the weights, which would work if I ran the model as usual but requested that the model include a term to give different scores to subgroups of individuals, whereas in the case of using weights I am requesting a single grand mean weighted by certain respondents. Not sure if what I’ve said has just added more confusion but it answers my original question!