Hi!
I am trying to fit with brms pharmacokinetic (PK) models. These models describe drug concentration profiles over time as a function of the dosing history. So we have measurements from the central blood stream of drug concentrations and we do know when someone took a drug and what amount he swallowed (if administration is orally). The commonly used data-sets for this look like this:
> head(pkdata[c("dv", "time", "cmt", "evid", "amt", "record")], 20)
dv time cmt evid amt record
1 NA 0.00 1 1 15 1
2 0.254 0.50 2 0 0 2
3 1.048 3.75 2 0 0 3
4 1.215 7.00 2 0 0 4
5 1.068 10.00 2 0 0 5
6 1.162 12.50 2 0 0 6
7 0.976 15.00 2 0 0 7
8 0.584 24.00 2 0 0 8
9 NA 24.00 1 1 10 9
10 0.625 48.00 2 0 0 10
11 NA 48.00 1 1 10 11
12 0.639 72.00 2 0 0 12
13 NA 72.00 1 1 10 13
14 0.708 96.00 2 0 0 14
15 NA 96.00 1 1 10 15
16 0.632 120.00 2 0 0 16
17 NA 120.00 1 1 10 17
18 0.671 144.00 2 0 0 18
19 NA 144.00 1 1 10 19
20 0.823 144.50 2 0 0 20
>
The measured concentration is in the dv
column. Some of these are set to NA
as these rows correspond to dosing rows (as flagged with evid=1
). So whenever evid=1
, then this means that the dose in amt
is being put into the compartment cmt
specified (PK systems are modelled using compartments).
I can get brms to fit the above using the non-linear interface just fine, see the attached R code which works with the attached (simulated) data. So I get a fit which is great, but all the post-processing fails on me due to the way brms is internally vectorised. With some debugging I found out that posterior_predict
, for example, processes the data set with the following record order:
> pp <- posterior_predict(fit, newdata=pkdata_1d, nsamples=10)
N = 250
record = [1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,7,7,7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,8,8,8,9,9,9,9,9,9,9,9,9,9,10,10,10,10,10,10,10,10,10,10,11,11,11,11,11,11,11,11,11,11,12,12,12,12,12,12,12,12,12,12,13,13,13,13,13,13,13,13,13,13,14,14,14,14,14,14,14,14,14,14,15,15,15,15,15,15,15,15,15,15,16,16,16,16,16,16,16,16,16,16,17,17,17,17,17,17,17,17,17,17,18,18,18,18,18,18,18,18,18,18,19,19,19,19,19,19,19,19,19,19,20,20,20,20,20,20,20,20,20,20,21,21,21,21,21,21,21,21,21,21,22,22,22,22,22,22,22,22,22,22,23,23,23,23,23,23,23,23,23,23,24,24,24,24,24,24,24,24,24,24,25,25,25,25,25,25,25,25,25,25]
So the first data row is being replicated 10 times, the second again 10 times and so on. However, changing order of things will break things big time and the predictions from the model are wrong. Any suggestions how to deal with this? What I would need is to pass into the simulation functions the data set always in the same order. Only then the dosing history is valid.
I wonder if the approach taken (declaring some rows with missing data and providing a non-linear function) can be improved. So would a custom family potentially solve this problem or is the data2
argument an option to explore (pass in dosing info as data2 argument… but then how would I access it in the non-linear function)?
Tagging @paul.buerkner who suggested the missing data approach a while ago to me… maybe more ideas?
Best,
Sebastian
oral_1cmt_simple.R (3.2 KB)
oral_1cmt_nm2.csv (20.8 KB)