Brms & pk models?


I am trying to fit with brms pharmacokinetic (PK) models. These models describe drug concentration profiles over time as a function of the dosing history. So we have measurements from the central blood stream of drug concentrations and we do know when someone took a drug and what amount he swallowed (if administration is orally). The commonly used data-sets for this look like this:

> head(pkdata[c("dv", "time", "cmt", "evid", "amt", "record")], 20)
      dv   time cmt evid amt record
1     NA   0.00   1    1  15      1
2  0.254   0.50   2    0   0      2
3  1.048   3.75   2    0   0      3
4  1.215   7.00   2    0   0      4
5  1.068  10.00   2    0   0      5
6  1.162  12.50   2    0   0      6
7  0.976  15.00   2    0   0      7
8  0.584  24.00   2    0   0      8
9     NA  24.00   1    1  10      9
10 0.625  48.00   2    0   0     10
11    NA  48.00   1    1  10     11
12 0.639  72.00   2    0   0     12
13    NA  72.00   1    1  10     13
14 0.708  96.00   2    0   0     14
15    NA  96.00   1    1  10     15
16 0.632 120.00   2    0   0     16
17    NA 120.00   1    1  10     17
18 0.671 144.00   2    0   0     18
19    NA 144.00   1    1  10     19
20 0.823 144.50   2    0   0     20

The measured concentration is in the dv column. Some of these are set to NA as these rows correspond to dosing rows (as flagged with evid=1). So whenever evid=1, then this means that the dose in amt is being put into the compartment cmt specified (PK systems are modelled using compartments).

I can get brms to fit the above using the non-linear interface just fine, see the attached R code which works with the attached (simulated) data. So I get a fit which is great, but all the post-processing fails on me due to the way brms is internally vectorised. With some debugging I found out that posterior_predict, for example, processes the data set with the following record order:

> pp  <- posterior_predict(fit, newdata=pkdata_1d, nsamples=10)
N = 250
record = [1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,7,7,7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,8,8,8,9,9,9,9,9,9,9,9,9,9,10,10,10,10,10,10,10,10,10,10,11,11,11,11,11,11,11,11,11,11,12,12,12,12,12,12,12,12,12,12,13,13,13,13,13,13,13,13,13,13,14,14,14,14,14,14,14,14,14,14,15,15,15,15,15,15,15,15,15,15,16,16,16,16,16,16,16,16,16,16,17,17,17,17,17,17,17,17,17,17,18,18,18,18,18,18,18,18,18,18,19,19,19,19,19,19,19,19,19,19,20,20,20,20,20,20,20,20,20,20,21,21,21,21,21,21,21,21,21,21,22,22,22,22,22,22,22,22,22,22,23,23,23,23,23,23,23,23,23,23,24,24,24,24,24,24,24,24,24,24,25,25,25,25,25,25,25,25,25,25]

So the first data row is being replicated 10 times, the second again 10 times and so on. However, changing order of things will break things big time and the predictions from the model are wrong. Any suggestions how to deal with this? What I would need is to pass into the simulation functions the data set always in the same order. Only then the dosing history is valid.

I wonder if the approach taken (declaring some rows with missing data and providing a non-linear function) can be improved. So would a custom family potentially solve this problem or is the data2 argument an option to explore (pass in dosing info as data2 argument… but then how would I access it in the non-linear function)?

Tagging @paul.buerkner who suggested the missing data approach a while ago to me… maybe more ideas?


oral_1cmt_simple.R (3.2 KB)
oral_1cmt_nm2.csv (20.8 KB)

I am not sure I fully understand all details yet and the behavior of brms to replicate each row ten times looks strange to me. Since you debugged the code already, do you have an idea why that happens? I would then think of a good fix once I understand the details better.

I can only speculate about the internals of brms here: When doing for 10 draws predictions, then brms obviously does copy each data-row of the original data-set 10x. This is done by extending each row in-place, so that you get


This changes the order of the data set. I did expect that I get

1,2, 1,2, 1,2, …

i.e. the data set order does not change. However, I don’t think that brms is doing anything wrong here. The data-format is wide and not long. So what I did is to adapt my data-format to be wide as well. So it it looks now like this:

> head(wide_pkdata %>%, 10)
    time cmt evid amt tau addl mdv    dv id   sdv time_dose_1 time_dose_2
1   0.50   2    0   0   0    0   0 0.254  1 0.249           0          24
2   3.75   2    0   0   0    0   0 1.048  1 1.040           0          24
3   7.00   2    0   0   0    0   0 1.215  1 1.166           0          24
4  10.00   2    0   0   0    0   0 1.068  1 1.102           0          24
5  12.50   2    0   0   0    0   0 1.162  1 1.011           0          24
6  15.00   2    0   0   0    0   0 0.976  1 0.915           0          24
7  24.00   2    0   0   0    0   0 0.584  1 0.615           0          24
8  48.00   2    0   0   0    0   0 0.625  1 0.619           0          24
9  72.00   2    0   0   0    0   0 0.639  1 0.620           0          24
10 96.00   2    0   0   0    0   0 0.708  1 0.620           0          24
   amt_dose_1 amt_dose_2 tau_dose_1 tau_dose_2 addl_dose_1 addl_dose_2
1          15         10          0         24           0           5
2          15         10          0         24           0           5
3          15         10          0         24           0           5
4          15         10          0         24           0           5
5          15         10          0         24           0           5
6          15         10          0         24           0           5
7          15         10          0         24           0           5
8          15         10          0         24           0           5
9          15         10          0         24           0           5
10         15         10          0         24           0           5
   prev_addl_dose_2 prev_addl_dose_1
1                 0                0
2                 0                0
3                 0                0
4                 0                0
5                 0                0
6                 0                0
7                 0                0
8                 1                0
9                 2                0
10                3                0

So now the two dosing events are coded in wide format essentially. Each data-row now specified all the previous doses which have been given. Since I need 5 columns per dose this data format is really wide… but it works! Moreover, I can actually make dosing entries which encode regular dosing schemes (this addl business)… and since we usually consider regular dosing patterns that covers most of what I need.

The final model call then looks like this:

Formula: log(dv) ~ brms_pk_1cmt_simple(lke + exp(mlka), lke, lV, time, time_dose_1, amt_dose_1, prev_addl_dose_1, tau_dose_1, time_dose_2, amt_dose_2, prev_addl_dose_2, tau_dose_2) 

It’s wide, yes… but it works!

The cool side-effect of doing things by row I can easily use parallelisation.

BTW, if I use loop=FALSE and use a non-linear function, then the non-linear function being called inside the partial sum function is being called without any reference to what sub-slice is being processed. It would be nice to make loop=FALSE and reduce_sum somehow work together…

Neat hack! I usually approach brms hacking from a bit different perspective, so just wanted to understand: what is your goal in using brms for this model? Is it to

a) save you work and use the posterior predictions and other post-processing functions in brms? or
b) let you easier make the model more complex by adding predictors to the model parameters? or
c) is it to allow other users to have easier way to use and modify your model?

If it is a) then likely your approach is best. If it is b) or c), it might be worthwhile to consider completely overtaking how brms moves from the linear predictors to observations. This gives you more flexibility and control, but would require you to write a bit more code. (I can elaborate if it is of interest to you).

If you could elaborate then that would be very helpful.

I am right now starting to explore brms as a code-gen utility with all the downstream niceties it offers. So it is all of the above a+b+c. These PK models can be made arbitrarily complex in themselves. So the structure of the model is being varied and the parameter models themselves.

Structurally I always need the same: The model is over patients and I need to go over a data-set which is sorted by patient and by time. That order is important.

I looked at brms and wondered if a custom family is more to the point or the mysterious data2 argument. I don’t know enough about it yet, so some pointers would be very welcome.

(and thanks for congratulating on the hack, but it’s almost trivial after I stepped back from my original long format obsession)

@wds15 I’m interested in any progress you can make with your specifications. I’ve been working on some dose-response modelling with brms and I think the syntax is very well suited to pharmacodynamics, but my impression was that only the simplest PK models would be compatible.

What is the model/ODE used here @wds15? And which do you use @AWoodward?

I think I have understood the problem and should have a fix soon.


@paul.buerkner that’s great to hear. I am happy to test any new things you come up with so that we can iterate quickly.

@Funko_Unko & @AWoodward I am using right now a very simple 1-cmt oral dosing model. It’s right in that as it stands one really should have an analytic solution available due to the requirement by brms to go line-by-line in the data-set. What would help a lot is being able to go subject-by-subject through data-sets, because then I can integrate the time-flow of an ODE across different data-rows. I would hope that @martinmodrak has some more suggestions as to how I can integrate better with brms to get a bit more control over things (like the by-patient iteration rather than by-row iteration).

Ultimately I intend to use brms as the working engine for these models such that I can leave code gen to brms entirely. All the post-processing will then become simple for my analyses where I want to compare different models (model structure, model simplifications, projpred stuff, etc.).

1 Like

Pushed the fix to brms master.

Ahh… so now when I set loop=FALSE, I do get the data-set in it’s original order if I see that right. That’s great! Thanks.

so problem fixed from your side?

Yep, it works on my end as expected. Here is the output from my previous debug run:

> pp  <- posterior_predict(fit, newdata=pkdata_1d, nsamples=10)
N = 25
record = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]
N = 25
record = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]
N = 25
record = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]
N = 25
record = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]
N = 25
record = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]
N = 25
record = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]
N = 25
record = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]
N = 25
record = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]
N = 25
record = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]
N = 25
record = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]

Which confirms that I get for 10 draws of posterior_predict 10 calls of the custom function. The data-set stays in order as I need it to.

…uhm…can we add a unit test for this? This would reassure that this property stay like it should. If you want I write one - anything I need to know?

great! If you want, I would be very happy if you could add a unit test for this. The test should go in tests/local/tests.models_new.R because it involves actually compiling a fitting model so should not be run as part of the regular tests.

Ok. I wrote one which hopefully is fine for you to include.

So there are few things to really break the shackles brms imposes :-) Not sure it is better for your use case (I don’t think I understand your aims fully), but hope it is an inspiration.

The first trick is to “turn off” any default brms processing by specifying an empty family, i.e. a custom distribution that always return 0.

The second trick is to use a stanvar for the likelihood block to supply our own processing.

A simple example combining the two:

test_df <- data.frame(res = rlnorm(100), patient = sample(1:10, 100, replace = TRUE), x = rnorm(100))

empty_single_param <- function() {
    "empty_single_param", dpars = c("mu"),
    links = c("identity"), 
    lb = 0, # We can put constraints that we expect for the data, but that's optional
    type = "real")

sv_empty_single_param <- stanvar(scode = "
      real empty_single_param_lpdf(real y, real mu) {
        return 0;
    ", block = "functions")

sv_likelihood <- stanvar(scode = "// Arbitrary computation here", block = "likelihood", position = "end")

make_stancode(res ~ x + (1 | patient), family = empty_single_param, stanvars = sv_empty_single_param + sv_likelihood,
              data = test_df)

gives us:

// generated with brms 2.15.1
functions {
      real empty_single_param_lpdf(real y, real mu) {
        return 0;
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  int prior_only;  // should the likelihood be ignored?
transformed data {
  int Kc = K - 1;
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
parameters {
  vector[Kc] b;  // population-level effects
  real Intercept;  // temporary intercept for centered predictors
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  vector[N_1] z_1[M_1];  // standardized group-level effects
transformed parameters {
  vector[N_1] r_1_1;  // actual group-level effects
  r_1_1 = (sd_1[1] * (z_1[1]));
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = Intercept + Xc * b;
    for (n in 1:N) {
      // add more terms to the linear predictor
      mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
    for (n in 1:N) {
      target += empty_single_param_lpdf(Y[n] | mu[n]);
    // Arbitrary computation here
  // priors including constants
  target += student_t_lpdf(Intercept | 3, 0.8, 2.5);
  target += student_t_lpdf(sd_1 | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
  target += std_normal_lpdf(z_1[1]);
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);

The important thing to note that at the point our “arbitrary computation” in the likelihood block is executed, the mu vector is already computed and in scope, so we can supply any code we want that goes from mu to incrementing the target, including any looping behaviour etc.

All of the rest is just working on top of this template to make brms prepare you the linear predictors you need in a format you can use for your custom code.

In your case you probably won’t be happy with just predicting a single parameter. There are many ways to have brms prepare you more predictors. One is to use a multi-parameter custom (and empty) family, with the limitation that one of the parameters must be called “mu” e.g.:

empty_multi_param <- function() {
    "empty_multi_param", dpars = c("mu", "lke", "lV"), 
    links = c("identity", "identity", "identity"), 
    lb = 0, # We can put constraints that we expect for the data, but that's optional
    type = "real")

sv_empty_multi_param <- stanvar(scode = "
      real empty_multi_param_lpdf(real y, real mu, real lke, real lV) {
        return 0;
    ", block = "functions")

make_stancode(bf(res ~ x + (1 | patient), lke ~ (1 | patient), lV ~ x), family = empty_multi_param, stanvars = sv_empty_multi_param + sv_likelihood,
              data = test_df)

Now we have three vectors (mu, lke, and lV) ready and computed at the point our custom likelihood is exectued. You can also use the vint or vreal addition terms to put additional data here.

Alternatively, you could use multivariate formula with fake data:

test_df_mv <- test_df 
test_df_mv$lke <- 0 # The number is irrelevant and will be ignored
test_df_mv$lV <- 0 # The number is irrelevant and will be ignored

make_stancode(bf(res ~ x + (1 | patient)) + bf(lke ~ (1 | patient)) + bf(lV ~ x) + set_rescor(FALSE), family = empty_single_param, stanvars = sv_empty_single_param + sv_likelihood,
              data = test_df_mv)

where we once again have three vectors prepared, but they are called mu_res, mu_lke and mu_lV.

Finally, one can use stanvar to also add additional data you would need. And note that at this point, we can even completely break the core brms abstraction that a single row in the data frame corresponds to a single observation. We can for example make each row in a dataframe correspond to a patient, add a dummy (ignored) response and the pass a n_patients x n_time matrix as additional data, e.g.:

patient_df <- data.frame(patient = 1:10, x = rnorm(10), .dummy = 0)

n_time <- 5
obs_matrix <- matrix(rnorm(n_time * nrow(patient_df)), nrow = nrow(patient_df), ncol = n_time)

sv_n_time <- stanvar(x = n_time, name = "n_time", scode = "int n_time;")
sv_obs_matrix <- stanvar(x = obs_matrix, name = "obs_matrix", scode = "matrix[N, n_time] obs_matrix;")

sv_likelihood_2 <- stanvar(scode = "
  for(n in 1:N) {
     target += normal_lpdf(obs_matrix[n,] | solve_my_ode(mu[n], lke[n], lV[n]), 1);
                           ", block = "likelihood", position = "end")

brm(bf(.dummy ~ x + (1 | patient), lke ~ (1 | patient), lV ~ x), family = empty_multi_param, stanvars = sv_empty_multi_param + sv_n_time + sv_obs_matrix + sv_likelihood_2,
              data = patient_df)

In all of the above approaches, posterior_linpred will work just fine to get your predictions of the parameters, but you’ll need to write your own code for posterior_epred or posterior_predict.

Does that make sense?


Sweet! That looks like a decent hack which combines the cool features of brms (prior spec, hierarchical spec) with all the necessary freedom I would need. Some Q’s and a comment:

  • If I were to use additional data via stanvar - how can I access that when I craft a custom posterior_epred& posterior_predict?
  • What needs to be done to get posterior_predict working?
  • Have you ever used the data2 argument? How does data2 compare against custom stanvar data?

More a comment:

  • The empty family trick is sweet… for performance reasons you should probably say loop=FALSE. This will avoid the loop which creates one-by-one a var variable in Stan. If you instead avoid the loop, then you can just return a rep_vector(0.0, rows(mu)) statement which is a single var and no loop.

In the past I used to insert at the beginning of the model block a statement like

if(!prior_only) {
   // custom code
} else // note the else branch opening on purpose

doing so will take over the prior_only block under my control. The downside is that I don’t get the nice processing of the parameters as you do, so that’s better.

(I am happy to see that also others do similar tricks on brms…it’s a really cool base working thing)

1 Like

You should be able to call standata(fit) which will include your custom data. But for anything more involved, I think a cleaner solution is to wrap the brmsfit in a custom object that carries all the necessary data (which may be in different form than what you pass to Stan) and define methods on this object.

That’s IMHO hugely task specific. If your processing is simple, you may be able to just implement posterior_predict_my_custom_family and use the pre-prepared structures from brms (as in the custom family vignette or at Using brms to model reaction times contaminated with errors). But I’ve mostly just made a new function that internally calls posterior_linpred to get the linear predictors and then works from there.

No, I don’t have experience with that. My impression is that brms actually manages data2 and interferes with how it is used, but I’ve never really dug into this.

That’s a good point. Shouldn’t I be able to just return a scalar 0 even from the vectorized version?

Yeah, it’s super powerful. But it also shows how big of a footgun brms can be for novice users and how much stuff is hidden in the internals… :-)

Yes, of course. Just return a 0.0 and not a vector (I realised that later). And yes, this should give you some more efficiency for large data-sets.

Indeed and I am often scared by the super-automatism and have to go through basic tests before I trust these numbers. Doing so I discovered very unexpected schemes when simulating new patients… which @paul.buerkner fixed right away (like always).

Say I have multiple distributional parameters. Will my posterior_linpred always be for what is deemed to be “mu”? Or can I even tell the function to give me the posterior for other distributional parameters?

I also saw that the posterior* functions can work wrt to some environment. Maybe this is also something to explore.

In any case, I am right now quite happy with my solution to have a wide data format. With the most recent fix for non-looped non-linear functions I can now use brms to attack (simple) PK problems and make even full use of within-chain parallelisation… and I always love using reduce_sum as it speeds up things a lot in this case.

1 Like

Yes, there’s the dpar argument for exactly that :-)

Yes, whatever works, works :-D