Stand-alone predictions in the context of a hierarchical random effect model

Model: I am working with a model that can be described, at the bird-eye level, as a random slope logistic model. I have 0/1 outcomes, some predictors, random intercepts and random slopes for my individuals. I am fitting the models I am satisfied with, goodness of fit and convergence and all.

Problem: suppose I am given a new individual with their predictors and their trajectory of the 0/1 outcomes. Can I run the sampled chains on these to obtain the random effects? Section 27.7 provides pointers in the “flat” case of i.i.d. data on how to pick the parameters that have been simulated, and feed them to the generated quantities without the model to sample from. But in my case, I have to deal with the random effects that I am declaring as parameters – and the random effects for the new data, obviously, have not been sampled. I cannot simply add that to the model and the sampler as the stuff runs for about 50 core-hours, so I cannot afford to re-run it for another 2000 new individuals that I need to produce the model predictions for.

Reprex as an Rmd fragment, so that the Stan model is in the middle. The goal is to get the vector u_i[2] for the new data.


title: “Random effects reprex”
output: html_document

knitr::opts_chunk$set(echo = TRUE, 
                      warning = FALSE, 
                      message = FALSE, 
                      dev = "svg") ## export images as SVG

library(ggplot2)
library(magrittr)
library(here)
library(dplyr)
library(tidyr)
library(ggplot2)
library(tibble)
library(rstan)
library(bayesplot)
library(knitr)

Simulate data

N <- 10
set.seed(1)
my_df <- data.frame( id=1:N, interc=rnorm(n=N,m=0.5,s=0.3), slope=rnorm(n=N, m=-0.5, s=0.3), n=sample(5:8, N, replace=T) ) 
my_df <- my_df[ rep(1:N, my_df$n), ]
my_df %>% group_by(id) %>% mutate(t=row_number()) %>% ungroup() %>%
  mutate(y=as.integer(interc+slope*t/3+rnorm(nrow(my_df),m=0,s=0.3)>0)) -> my_df

Stan code

data {
  int<lower=1> N;                                     // total number of observations
  int<lower=0,upper=1> outcome[N];                    // outcome
  int<lower=1> N_people;                              // number of unique individuals
  int<lower=1,upper=N_people> seq_person_id[N];       // SEQUENTIAL person id -- important for indexing
  vector[N]    predictor;                             // predictor that goes with random slope
}
parameters {
  // variance components
  vector<lower = 0>[2]  tau_u;   
  // random effects for people
  vector[2] u_i[N_people];
  // correlation matrix
  corr_matrix[2] rho_matrix;

}
transformed parameters {
  cov_matrix[2] Sigma;
  vector[N] theta;
  
  // covariance matrix
  Sigma = quad_form_diag(rho_matrix,tau_u);
  
  // logit of outcome
  for (t in 1:N)
    theta[t] = u_i[seq_person_id[t],1] 
               + u_i[seq_person_id[t],2]*predictor[t];
}
model {
  // outcome
  outcome ~ bernoulli(inv_logit(theta));

  // variances
  tau_u ~ cauchy(0,1);
  
  // correlation
  rho_matrix ~ lkj_corr(1);
  
  // random effects
  u_i ~ multi_normal( [0, 0]', Sigma );
  
}

Bayesian computation

Prepare the data for Stan

stan_data <- list(
  N = nrow(my_df),                                      # int<lower=1> N;
  outcome = unlist(my_df[,"y"]),                        # int<lower=0,upper=1> outcome[N];
  N_people = (my_df %>% distinct(id) %>% nrow()),       # int<lower=1> N_people;
  seq_person_id = unlist(my_df[,"id"]),                 # int<lower=1,upper=N_people> seq_person_id[N]; 
  predictor = unlist(my_df[,"t"])                       # vector[N]    predictor;
)

Sample from the model

stan_mcmc_rslope_logit <- rstan::sampling(
    random_slope_logit, data=stan_data,
        chains = 2, seed = 1234,
        warmup = 1000, iter = 2000,
        control=list(adapt_delta=0.9, stepsize_jitter = 0.1),
        refresh = 100,
        init = list( list(tau_u=c(0.2,0.2)), list(tau_u=c(0.3,0.3)) )
    )

New observations

new_df <- data.frame( y_new = c(0,1,1,0,1), t_new=1:5)

What now??

Estimating the group-level parameters for new observations is a bit more of a complicated endeavour, and was discussed more over in this thread: Posterior Predictive for hierarchical model without further hyper-parameter inference - #8 by maxbiostat

Note that you can pretty significantly optimise your model by using the non-centered parameterisation for u_i and using the compound bernoulli_logit_glm distribution for your likelihood, like so:

data {
  int<lower=1> N;                                     // total number of observations
  int<lower=0,upper=1> outcome[N];                    // outcome
  int<lower=1> N_people;                              // number of unique individuals
  int<lower=1,upper=N_people> seq_person_id[N];       // SEQUENTIAL person id -- important for indexing
  matrix[N, 1]    predictor;                             // predictor that goes with random slope
}
parameters {
  // variance components
  vector<lower = 0>[2]  tau_u;
  // 'Raw' N(0,1) variates for the non-centered param
  matrix[2, N_people] u_i_raw;
  // correlation matrix
  cholesky_factor_corr[2] rho_matrix_cholesky;

}
transformed parameters {
  // random effects for people
  // Implies u_i ~ multi_normal( [0, 0]', Sigma )
  // Specified with column per group (person) since it's
  // faster to extract a column than a row
  matrix[2, N_people] u_i = diag_pre_multiply(tau_u, rho_matrix_cholesky) * u_i_raw;
}
model {
  // correlation
  rho_matrix_cholesky ~ lkj_corr_cholesky(1);
  // variances
  tau_u ~ cauchy(0,1);

  // 'Raw' N(0,1) variates for the non-centered param
  to_vector(u_i_raw) ~ std_normal();

  // outcome
  outcome ~ bernoulli_logit_glm(predictor,
                                u_i[1, seq_person_id]',
                                u_i[2, seq_person_id]');
}

1 Like

Thanks for the syntax tips. In my production model (which is 150 lines of code as is right now, there is way more going on and I only wanted to reprex the methodological issue I am facing), I do the noncentral parameterization of the random effects (although I am not convinced it helps much – my variance components are very well isolated from zero with hundreds of degrees of freedom, and I doubt I am facing the funnel situation.) Are you saying bernoulli_logit_glm() would work much faster?

What @maxbiostat has proposed is pretty close to what I am attempting to do now – draw the independent random effects, independently estimate the regression of the sequence of events on time, combine the two using the rules for two normals. I know it is not kosher Bayes but it is good enough for the business application :).

although I am not convinced it helps much – my variance components are very well isolated from zero with hundreds of degrees of freedom, and I doubt I am facing the funnel situation

When you have correlated random effects, the primary benefit comes the fact that the multivariate (cholesky) likelihood is much more computationally expensive than the univariate standard normal and a single matrix multiplication

Are you saying bernoulli_logit_glm() would work much faster?

Both faster and more efficient. The _glm likelihoods calculate analytic gradients for the respective inputs separately, which leads to more efficient sampling. They also avoid calculating gradients for inputs which are data and not parameters, allowing for less computation for each evaluation. The _glmlikelihoods can also be GPU-accelerated, which provides a significant improvement with large datasets

1 Like