Sample new subjects

Given a hierarchical model, I would like to get posterior predictions for new subjects based on how much they deviate from the central tendency.

For example, I fitted a hierarchical model to the sleepstudy dataset:

library(tidyverse)
library(brms)

data = read.csv('https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/lme4/sleepstudy.csv') %>% 
  as_tibble(data) %>%
  mutate(Subject = factor(Subject))

m <- brm(Reaction ~ Days + (Days | Subject), 
         data = data)

The model summary is:

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: Reaction ~ Days + (Days | Subject) 
   Data: data (Number of observations: 180) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Group-Level Effects: 
~Subject (Number of levels: 18) 
                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)          26.86      6.77    15.31    41.80 1.00     1842     2492
sd(Days)                6.58      1.53     4.17     9.98 1.00     1394     1715
cor(Intercept,Days)     0.08      0.30    -0.47     0.68 1.01      988     1534

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept   251.38      7.43   236.79   266.35 1.00     1827     2620
Days         10.36      1.69     6.86    13.67 1.00     1502     1389

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma    25.93      1.52    23.17    29.11 1.00     3481     2594

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Now, I would like to get posterior predictions for three, new prototypical subjects: the average subject and the two subjects that deviate the most (e.g., 2 standard deviations) from the central tendency in the negative and positive direction. For example, in the figure below, I color-coded the posterior prediction for each subject based on the average effect of Days (the slope of the lines). The dashed line is the central tendency of the population. The average subject could be someone like 332, and the extreme subjects could be someone resembling 308 and 335.

Is it possible to sample new subjects based on the random effect magnitude? If I understand things correctly, brms already offers posterior predictions for the average subject (with re_formula = NA) and I think a combination of re_formula=NULL, allow_new_levels=TRUE, and sample_new_levels="gaussian" (for an explanation of these option see also Understanding sample_new_levels = "uncertainty", "gaussian", and "old_levels") could get me close to what I am trying to achieve. I am thinking something like

posterior_epred(m, 
      newdata = data.frame(Subject = "extreme_subject"), 
      allow_new_levels = TRUE, 
      sample_new_levels = "gaussian", 
      scale = 6.58*2  # sample new subject at 2 standard deviations (sd(Days) = 6.58)
) 

where scale is a hypothetical new option to control where to sample the new subject
First, does it make sense? And second, would you be able to help :)? I am sure there are already some established procedures I am not aware of yet. Thanks


  • Operating System: Win 10
  • brms Version: 2.13.3
1 Like

I think you are on the right track. Let’s see what @paul.buerkner has to say :)

Currently, I don’t think selecting the location of new subjects is possible. I understand your motivation behind it, but I am not sure if (and how) it makes sense to implement such a feature. Of course, you can also select and illustrate the predictions of certain subjects matching a certain kind of average or extreme you want to show (as you already did).

1 Like