Sample new subjects

GianniRewer · July 30, 2020, 12:30pm

Given a hierarchical model, I would like to get posterior predictions for new subjects based on how much they deviate from the central tendency.

For example, I fitted a hierarchical model to the sleepstudy dataset:

library(tidyverse)
library(brms)

data = read.csv('https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/lme4/sleepstudy.csv') %>% 
  as_tibble(data) %>%
  mutate(Subject = factor(Subject))

m <- brm(Reaction ~ Days + (Days | Subject), 
         data = data)

The model summary is:

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: Reaction ~ Days + (Days | Subject) 
   Data: data (Number of observations: 180) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Group-Level Effects: 
~Subject (Number of levels: 18) 
                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)          26.86      6.77    15.31    41.80 1.00     1842     2492
sd(Days)                6.58      1.53     4.17     9.98 1.00     1394     1715
cor(Intercept,Days)     0.08      0.30    -0.47     0.68 1.01      988     1534

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept   251.38      7.43   236.79   266.35 1.00     1827     2620
Days         10.36      1.69     6.86    13.67 1.00     1502     1389

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma    25.93      1.52    23.17    29.11 1.00     3481     2594

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Now, I would like to get posterior predictions for three, new prototypical subjects: the average subject and the two subjects that deviate the most (e.g., 2 standard deviations) from the central tendency in the negative and positive direction. For example, in the figure below, I color-coded the posterior prediction for each subject based on the average effect of Days (the slope of the lines). The dashed line is the central tendency of the population. The average subject could be someone like 332, and the extreme subjects could be someone resembling 308 and 335.

Is it possible to sample new subjects based on the random effect magnitude? If I understand things correctly, brms already offers posterior predictions for the average subject (with re_formula = NA) and I think a combination of re_formula=NULL, allow_new_levels=TRUE, and sample_new_levels="gaussian" (for an explanation of these option see also Understanding sample_new_levels = "uncertainty", "gaussian", and "old_levels") could get me close to what I am trying to achieve. I am thinking something like

posterior_epred(m, 
      newdata = data.frame(Subject = "extreme_subject"), 
      allow_new_levels = TRUE, 
      sample_new_levels = "gaussian", 
      scale = 6.58*2  # sample new subject at 2 standard deviations (sd(Days) = 6.58)
)

where scale is a hypothetical new option to control where to sample the new subject
First, does it make sense? And second, would you be able to help :)? I am sure there are already some established procedures I am not aware of yet. Thanks

Operating System: Win 10
brms Version: 2.13.3

torkar · August 4, 2020, 9:24am

I think you are on the right track. Let’s see what @paul.buerkner has to say :)

paul.buerkner · August 5, 2020, 11:19am

Currently, I don’t think selecting the location of new subjects is possible. I understand your motivation behind it, but I am not sure if (and how) it makes sense to implement such a feature. Of course, you can also select and illustrate the predictions of certain subjects matching a certain kind of average or extreme you want to show (as you already did).

Topic		Replies	Views
Posterior_predict() in brms brms	1	896	June 14, 2020
Question regarding posterior_predict used on new observations with new covariate levels brms	1	366	August 11, 2020
Hierarchical model 101: Link, categorical without baseline, posterior predictive checks brms bayesplot	10	2249	October 23, 2018
Post-hoc test brms brms	5	2831	May 21, 2022
Hypothesis test brms Modeling	10	1998	December 21, 2022

Sample new subjects

Related topics