Specifying a nonlinear mixed effects model using brms

I am working on a project where I am trying to fit a nonlinear mixed effects model from a Bayesian perspective. The data set contains values (the total lumens) and is measured for 40 samples (sample identifiers are contained in ids ) at multiple time points (measured in days ). Here is a snapshot of the data I have collected:

Screen Shot 2022-08-30 at 8.44.26 AM

as well as a plot of the data I am working with

Screen Shot 2022-08-30 at 8.44.36 AM

If we were to connect the data based on ids, you would see 40 lines in the plot (i.e., the data is longitudinal). From a physics point of view, I know that the relationship between my data and time should look something like an exponential decaying function. With that in mind, as a first pass, I would like to first try to fit a nonlinear model based on an exponentially decaying function. Something like (and I’m open to suggestions for a better model)

y_{ij} = a_i + \exp(-b_i\times \text{days}) + \varepsilon_{ij}

for i=1,...,40 and j is the day. However, where I am struggling is incorporating the longitudinal part. As I am unaware of how to do it, it would be useful to see how to write down the correct model. Ultimately I would like to be able to run the model in R using the brms package which I think proceeds as follows:

model <- brm(
      bf(values ~ a * exp(-b * days),
         a ~ 1 + (1|ids), b ~ 1 + (1|ids),
         nl = TRUE),
      data = dat, family = gaussian(),
      prior = c(
        prior(normal(0, 10), class = "b", nlpar = "a"),
        prior(normal(0, 10), class = "b", nlpar = "b"),
        prior(normal(0, 10), class = "sd", nlpar = "a"),
        prior(normal(0, 10), class = "sd", nlpar = "b")
 )
)

But I am not sure if that is the correct specification. Does the above model imply that each id gets it’s own estimate of a and b? I guess also I would like to run this model with non-informative priors as well.

This looks OK to me, but would be interested in others feedback. Two suggestions:

(1) Use the “sample_prior = ‘only’” argument and your domain expertise to develop appropriate priors. Your posterior predictive plots samples using only the priors should cover the outcome space that is reasonable. I.e., if you were looking at human heights, predicted heights should be non-negative and probably within 0-10 meters.

(2) Try estimating correlations between the a and b parameters for the ids group. (1 | arbitrary_id | ids) for each group intercept. When you predict using new levels, you’ll probably want that correlation preserved, if it exists.

Edit: (3) If you know a and b have to be constrained (ie positive or between 0 - 1) to get the required shape, you can wrap them in exp() or inv_logit() in the first line so your sampling on the ‘linear’ parameter formulas can be unconstrained.

Thanks @franzsf for the details. Would you mind posting exactly what you are describing in a code block so that it’s a bit more concrete for me?

Sure thing. Eyeballing your graph, I would guess you want both “a” and “b” to be positively constrained. Wrapping both in exp() means you have to think about your priors on the unconstrained scale. I.e., if you want “a” to be between 0 - 100, something like a normal(2,1) prior might cover most of the ground you’re interested in. The problem is that, as model complexity increases, its much harder to choose priors for all the interacting parameters. That’s where looking at posterior predictions with the sample_prior = ‘only’ is valuable.

I’d see how this pp_check looks. You might also consider distributions with positive only support (e.g. lognormal).

model <- brm(
      bf(values ~ exp(a) * exp(-exp(b) * days),
         a ~ 1 + (1 | id | ids), b ~ 1 + (1 | id | ids),
         nl = TRUE),
      data = dat, 
      family = gaussian(),
      sample_prior = 'only',
      prior = c(
        prior(normal(3, 1), class = "b", nlpar = "a"),
        prior(normal(0, 1), class = "b", nlpar = "b"),
        prior(normal(0, 1), class = "sd", nlpar = "a"),
        prior(normal(0, 1), class = "sd", nlpar = "b")
 )
)

pp_check(model,"intervals_grouped", group = "ids")

@franzsf here is a plot of the pp_checks

On a somewhat related note. Let’s say I am able to fit this model and everything looks good. How can I take draws of the posterior predictive model for the regression curve? I.e., how do I generate “future curves”?

So assuming the model had no errors in sampling, you probably should adjust the priors.
Once you’re all done, you can get the fitted estimates with or without all the varying effects – depends if you’re trying to predict the population average, the potential range across any id, or one specific id.

Something like this should get you close. I recommend getting into the brms documentation on fitted and predict…

data_blank <- as.data.table(expand.grid("days" = seq(0,365,1),
                                        "id" = "xyz"))


fitted_estimates_no_re <- cbind(data_blank, fitted(model,
                                             newdata = data_blank,
                                             re_formula = NA,
                                             summary = TRUE,
                                             probs = c(0.05,0.95)))

fitted_estimates_all_re <- cbind(data_blank, fitted(model,
                                                  newdata = data_blank,
                                                  allow_new_levels = TRUE,
                                                  summary = TRUE,
                                                  probs = c(0.05,0.95)))

fitted_estimates_specific_id <- cbind(dat, fitted(model,
                                                  summary = TRUE,
                                                  probs = c(0.05,0.95)))