Given a hierarchical model, I would like to get posterior predictions for new subjects based on how much they deviate from the central tendency.
For example, I fitted a hierarchical model to the sleepstudy
dataset:
library(tidyverse)
library(brms)
data = read.csv('https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/lme4/sleepstudy.csv') %>%
as_tibble(data) %>%
mutate(Subject = factor(Subject))
m <- brm(Reaction ~ Days + (Days | Subject),
data = data)
The model summary is:
Family: gaussian
Links: mu = identity; sigma = identity
Formula: Reaction ~ Days + (Days | Subject)
Data: data (Number of observations: 180)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 4000
Group-Level Effects:
~Subject (Number of levels: 18)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 26.86 6.77 15.31 41.80 1.00 1842 2492
sd(Days) 6.58 1.53 4.17 9.98 1.00 1394 1715
cor(Intercept,Days) 0.08 0.30 -0.47 0.68 1.01 988 1534
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 251.38 7.43 236.79 266.35 1.00 1827 2620
Days 10.36 1.69 6.86 13.67 1.00 1502 1389
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 25.93 1.52 23.17 29.11 1.00 3481 2594
Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Now, I would like to get posterior predictions for three, new prototypical subjects: the average subject and the two subjects that deviate the most (e.g., 2 standard deviations) from the central tendency in the negative and positive direction. For example, in the figure below, I color-coded the posterior prediction for each subject based on the average effect of Days
(the slope of the lines). The dashed line is the central tendency of the population. The average subject could be someone like 332
, and the extreme subjects could be someone resembling 308
and 335
.
Is it possible to sample new subjects based on the random effect magnitude? If I understand things correctly, brms
already offers posterior predictions for the average subject (with re_formula = NA
) and I think a combination of re_formula=NULL
, allow_new_levels=TRUE
, and sample_new_levels="gaussian"
(for an explanation of these option see also Understanding sample_new_levels = "uncertainty", "gaussian", and "old_levels") could get me close to what I am trying to achieve. I am thinking something like
posterior_epred(m,
newdata = data.frame(Subject = "extreme_subject"),
allow_new_levels = TRUE,
sample_new_levels = "gaussian",
scale = 6.58*2 # sample new subject at 2 standard deviations (sd(Days) = 6.58)
)
where scale
is a hypothetical new option to control where to sample the new subject
First, does it make sense? And second, would you be able to help :)? I am sure there are already some established procedures I am not aware of yet. Thanks
- Operating System: Win 10
- brms Version: 2.13.3