Extracting predicted nonlinear random effects after covariates

I wonder if there’s an established best practice / easiest method for extracting posterior draws or summaries of nonlinear random effects after application of covariates. I started off feeling like this should be a trivial problem, but as I delve deeper, and as I dig through brms:::prepare_predictions() and such, it gets quite complicated! Making a solution that is robust to different model specifications and efficient is a hard problem, and I feel like I must be reinventing the wheel here! Especially since using something like model.matrix() becomes quite hard when one has all the dimensionality of the posterior to think about too. Hats off to @paul.buerkner for the impressive work in the guts of brms here!

I’m running a model like this

fit_formula <- bf( Y ~ function(logA, logB, logC, logD, logE, ...),
     # Nonlinear variables
     logA + logB + logC + logD + logE ~ 1 + Group + Age + (1|m|ID),
     # Sigma
     sigma ~ 0 + sigma_multiplier + (1|ID),
     # Nonlinear fit
     nl = TRUE)

So, here we have both discrete as well as continuous predictors for logA through logE, and I want the model predictions of these parameter values. We might also want to take their exponential at some point (so I’d interested in both summaries as well as draws).

Is there some function of tidybayes ( @mjskay ), or posterior, or brms function that simplifies this? Otherwise, I guess I’ll just keep on plugging at my half-baked solution, and I’ll share it here later in case it helps anyone else once it’s done.

Thanks in advance!

  • Operating System: Pop!_OS 20.04 LTS
  • brms Version: 2.13.9

Just following up with my currently hacky and ugly solution that I’m using for now. I’ll probably need to fix it up at a later point to accommodate more complicated scenarios (at the moment, it only handles one random effect for example). But hopefully this helps someone else out there, or at least makes it easier to point to a more established and robust solution to this task.

I’m modelling a pharmacokinetic curve over time, so I have about 20 values within each ID over time. So I first create modelmat, which contains one row for each ID, from my modeldat with all the data. This is because all my fixed effects are at an ID level - i.e. they don’t differ within any given ID.

modelmat <- modeldat %>% 
  filter(!duplicated(ID)) %>% 
  mutate(IDno = 1:n())

Then I want to extract summaries of my parameters for each random ID, after accounting for the fixed effects. I do so with a series of functions. First, the helper functions:

# Get the fixed effects formula
formula_fix <- function(object, parno) {
  formula <- paste0("~ ", as.character(object$formula$pforms[[parno]][3]))
  as.formula(stringr::str_remove(formula, " ?\\+ ?\\(.*\\)"))
}

# Get the random effect identifier
formula_ran <- function(object, parno) {
  formula <- paste0("~ ", as.character(object$formula$pforms[[parno]][3]))
  stringr::str_match(formula, "\\(.*\\| ?(.*)\\)")[2]
}

# Get the name of a fixed effect parameter from its number
parname_get <- function(object, parno) {
  as.character(object$formula$pforms[[parno]][2])
}

# Get the number of a fixed effect parameter from its number
parno_get <- function(object, parname) {
 which(names(object$formula$pforms)==parname)
}

# Get the samples for an individual parameter from an individual curve
samples_par_i <- function(drawsdf_fixpar, drawsdf_ranpar, modelmatrix, IDno, parname) {
  
  covar <- modelmatrix[IDno,]
  covar_mat <- matrix(rep(covar, each=nrow(drawsdf_fixpar)), ncol=length(covar))
  fsamples <- apply(covar_mat * drawsdf_fixpar, 1, sum)
  
  rsamples <- select(drawsdf_ranpar,  contains(paste0(parname, "[", IDno, ",")))
  
  as.numeric(unname(unlist(fsamples+rsamples)))
}

# Summarise the samples for a single individual and a single parameter
summarise_par_indiv <- function(drawsdf_fixpar, drawsdf_ranpar, modelmatrix, IDno, parname) {

  samples <- samples_par_i(drawsdf_fixpar, drawsdf_ranpar, modelmatrix, IDno, parname)
    
  tibble(
    Estimate = mean(samples),
    Est.Error = sd(samples),
    "l-95% CI" = quantile(samples, probs = 0.025),
    "u-95% CI" = quantile(samples, probs = 0.975),
  ) 
}

# Summarise the samples for all individuals for a particular parameter
summarise_par_group <- function(object, modelmat, parname) {
  
  parno <- parno_get(object, parname)
  
  modelmatrix <- model.matrix(formula_fix(object, parno), modelmat)
  
  ran_ef <- formula_ran(object, parno)
  
  draws <- posterior::as_draws(object$fit)
  drawsdf <- posterior::as_draws_df(draws)
  
  drawsdf_fixpar <- drawsdf %>% 
    select(contains(paste0("b_", parname)))
  
  drawsdf_ranpar <- drawsdf %>% 
  select(starts_with(paste0("r_",ran_ef,"__", parname)))
  
  as_tibble(modelmatrix) %>% 
    mutate(IDno = 1:n(),
           Parameter = parname) %>% 
    select(IDno, Parameter, everything()) %>% 
    group_by(IDno) %>% 
    mutate(Summary = map(IDno, ~summarise_par_indiv(drawsdf_fixpar, drawsdf_ranpar, 
                                                modelmatrix, .x, parname))) %>% 
    ungroup() %>% 
    unnest(Summary)
  
}

And then the main function using the helper functions, where object is the brms output, and modelmat is described above.

ranef_preds <- function(object, modelmat) {
  
  draws <- as_draws(object$fit)
  drawsdf <- posterior::as_draws_df(draws)
  predprep <- brms::prepare_predictions(object)
  
  parnames <- names(predprep$nlpars)
  parnos <- map_dbl(parnames, ~parno_get(object, .x))
  
  bind_rows( map(parnames, ~summarise_par_group(object, modelmat, .x)) )
  
}

And this gives me a summary that looks like this

> summarydat[245:255,]
# A tibble: 11 x 8
    IDno Parameter `(Intercept)` GroupGroup2 Estimate Est.Error `l-95% CI`
   <int> <chr>             <dbl>       <dbl>    <dbl>     <dbl>      <dbl>
 1    45 logbpnd               1           0    0.904    0.0250      0.850
 2    46 logbpnd               1           0    0.892    0.0257      0.833
 3    47 logbpnd               1           0    0.906    0.0214      0.863
 4    48 logbpnd               1           0    0.908    0.0252      0.860
 5    49 logbpnd               1           0    0.888    0.0281      0.823
 6    50 logbpnd               1           0    0.904    0.0246      0.855
 7    51 logbpnd               1           1    1.00     0.0255      0.958
 8    52 logbpnd               1           1    1.01     0.0281      0.961
 9    53 logbpnd               1           1    1.00     0.0229      0.961
10    54 logbpnd               1           1    0.987    0.0255      0.930
11    55 logbpnd               1           1    1.00     0.0243      0.956
# … with 1 more variable: `u-95% CI` <dbl>

where logbpnd is one of my parameters (like A,B,C etc in the example above) that I’ve simulated to show group differences. So this gives me the outcomes to compare to what I would have got from the output of the nls() model, after accounting for the effects of my fixed effects in my hierarchical model.

So, about this implementation: it’s slow, and it’s probably quite fragile, and it only handles one random effect at a time. So it’s currently pretty hacky, but if it helps anyone, then I’d be delighted. Also, if this makes it easier to point to an existing function which does the same thing better, that’d be a great outcome too :).

Sorry for the late reply, I was on vacation. Have you tried posterior_epred(..., nlpar = "xxx")?

1 Like

No need to apologise - hope you enjoyed the break!

This is super! Thank you so much for the pointer. I never found this function while looking about, but this completely solves the issue in a much neater manner!

posterior_epred( brmsfit, nlpar="logbpnd", newdata=modelmat )

Really happy I don’t have to rely on my monstrosity :)

Thanks again!