# Bounding posterior output of pharmacokinetic model

Hello,

I am fitting a non-linear “one compartment extravenous pharmacokinetic model” to drug data. It’s a time series – basically following a certain drug Dose, D, the body incorporates the drug over Time, T, given a certain absorption rate, K_a, but simultaneously eliminates the drug following an elimination rate, K_e, and a clearance, C_l:

\frac{D K_e K_a (e^{-K_e T} - e^{-K_a T})}{C_l (K_a - K_e)}

The model seems to be doing a good job using brms (please see fully reproducible example below), but there are two conditions that I would like to implement to improve the likelihood estimates:

1 – Reject values of mu that are negative, simply because the response variable is drug concentration in the body, hence it cannot be negative. I tried fitting the model using a Gamma distribution, but it returns unrealistic values and I do not know why. Is there a way to fit it using the current Gaussian distribution but rejecting parameter estimates when mu < 0?

2 – As far as I understand this equation, the actual model fit can be achieved by multiple combinations of K_a, K_e, and C_l, making this a complicated parameter space to get right answers. I have successfully implemented an initial constraint on the prior of K_a (see below), but I would also like to add a second condition here that depends on a secondary parameter that is calculated based on the estimated parameters at each iteration. Basically, one can derive the “time to the peak” (T_p) and the “peak” (P):

T_p = \frac{ln(K_a) - ln(K_e)}{K_a - K_e} # in time units
P = \frac{D K_e K_a (e^{-K_e T_p} - e^{-K_a T_p})}{C_l (K_a - K_e)} # in concentration units

Given these two equations, one can calculate how efficient (E) an organism is at absorbing the drug up to the peak:

E = \frac{P}{K_a T_p}

Efficiencies by definition are bounded between 0 and 1, so I would like to add the above calculations to the brms code to generate some type of rejection statement, e.g.:

if (E < 0 || E > 1)
reject("E is bounded between 0 and 1; found E=", E);


I would really appreciate your assistance implementing conditions 1 and 2. Here’s the fully reproducible example, but first here’s my sessionInfo():

sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.4

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
 en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
 stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 brms_2.8.0 Rcpp_1.0.1

loaded via a namespace (and not attached):
 mvtnorm_1.0-10       lattice_0.20-38      prettyunits_1.0.2    ps_1.3.0             zoo_1.8-5            gtools_3.8.1         assertthat_0.2.1
 digest_0.6.18        mime_0.6             R6_2.4.0             plyr_1.8.4           backports_1.1.3      ggridges_0.5.1       stats4_3.5.3
 coda_0.19-2          ggplot2_3.1.0        colourpicker_1.0     pillar_1.3.1         rlang_0.3.3          lazyeval_0.2.2       miniUI_0.1.1.1
 callr_3.2.0          Matrix_1.2-15        DT_0.5               shinythemes_1.1.2    shinyjs_1.0          stringr_1.4.0        htmlwidgets_1.3
 loo_2.1.0            igraph_1.2.4         munsell_0.5.0        shiny_1.2.0          compiler_3.5.3       httpuv_1.5.0         rstan_2.18.2
 pkgconfig_2.0.2      base64enc_0.1-3      pkgbuild_1.0.3       rstantools_1.5.1     htmltools_0.3.6      tidyselect_0.2.5     tibble_2.1.1
 gridExtra_2.3        threejs_0.3.1        matrixStats_0.54.0   crayon_1.3.4         dplyr_0.8.0.1        later_0.8.0          grid_3.5.3
 nlme_3.1-137         xtable_1.8-3         gtable_0.3.0         magrittr_1.5         StanHeaders_2.18.1   scales_1.0.0         cli_1.1.0
 stringi_1.4.3        reshape2_1.4.3       promises_1.0.1       dygraphs_1.1.1.6     xts_0.11-2           tools_3.5.3          glue_1.3.1
 shinystan_2.5.0      markdown_0.9         purrr_0.3.2          crosstalk_1.0.0      processx_3.3.0       rsconnect_0.8.13     abind_1.4-5
 parallel_3.5.3       inline_0.3.15        colorspace_1.4-1     bridgesampling_0.6-0 bayesplot_1.6.0      Brobdingnag_1.2-6

library(brms)
rstan::rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

calculateMinimumlKa  <-  function (data, minimumScaling = 1) {
dt  <-  data$Time_days[which.max(data$Drug_concentration_mg.ml)] - data$Time_days dy <- max(data$Drug_concentration_mg.ml) - data$Drug_concentration_mg.ml log(ifelse(dy == 0, data$Drug_concentration_mg.ml, dy) / ifelse(dt == 0, data$Time_days, dt) * minimumScaling) } data <- read.csv(text = 'Drug_concentration_mg.ml,Dose_mg,Time_days 0.0006773291,13.57695,0.01 0.4894254798,13.57695,0.88 1.0282572998,13.57695,1.24 1.6008287008,13.57695,1.77 1.5079260525,13.57695,2.10 1.4089264576,13.57695,2.85 1.1498402861,13.57695,3.93 0.8579193906,13.57695,4.95 0.5450494898,13.57695,7.05 0.2563178348,13.57695,11.97 0.1634733689,13.57695,15.98 0.0706932426,13.57695,23.74 0.0774802343,13.57695,28.96 0.0364683985,13.57695,53.01', header = TRUE, stringsAsFactors = FALSE) minLKa <- calculateMinimumlKa(data) maxLKa <- log(exp(minLKa) * 1e3) kineticsPriors <- c(prior(normal(0, 2), nlpar = 'lKe'), eval(parse(text = paste0('prior(normal(', minLKa * 1.5, ', 5), nlpar = \'lKa\', lb = ', minLKa, ', ub = ', maxLKa,')'))), prior(normal(10, 5), nlpar = 'lCl')) kineticsInits <- list(list(lKe = -1, lKa = -4, lCl = 2), list(lKe = 0, lKa = -1, lCl = 15), list(lKe = -5, lKa = 6, lCl = 10)) # define one compartment extravenous pharmacokinetic model # notice that parameters are fitted on the log scale to force them to be positive kineticsEquation <- brms::bf(Drug_concentration_mg.ml ~ Dose_mg * exp(lKe + lKa - lCl) * (exp(-exp(lKe) * Time_days) - exp(-exp(lKa) * Time_days)) / (exp(lKa) - exp(lKe)), lKa ~ 1, lKe ~ 1, lCl ~ 1, nl = TRUE) set.seed(1) model <- brms::brm(kineticsEquation, data = data, family = gaussian(), prior = kineticsPriors, sample_prior = TRUE, chains = 3, cores = 3, iter = 1.5e4, warmup = 7.5e3, inits = kineticsInits, control = list(adapt_delta = 0.9)) # add time to peak, and peak, then plot the data pars <- as.data.frame(t(brms::fixef(model))['Estimate', , drop = FALSE]) names(pars) <- gsub('_Intercept', '', names(pars)) T_p <- (pars$lKa - pars$lKe) / (exp(pars$lKa) - exp(pars$lKe)) # time to peak in days P <- data$Dose_mg * exp(pars$lKe + pars$lKa - pars$lCl) * (exp(-exp(pars$lKe) * T_p) - exp(-exp(pars$lKa) * T_p)) / (exp(pars$lKa) - exp(pars$lKe)) # peak in mg/ml xseq <- seq(min(data$Time_days), max(data$Time_days), length.out = 200) modelPrediction <- predict(model, newdata = data.frame(Time_days = xseq, Dose_mg = unique(data$Dose_mg)), re_formula = NA, summary = TRUE, probs = c(0.025, 0.975))
transparentCol   <-  rgb(255, 99, 71, max = 255, alpha = 125)
plot(data$Time_days, data$Drug_concentration_mg.ml, ylim = range(modelPrediction), xlab = 'Time (days)', ylab = 'Drug concentration (mg / ml)', las = 1, pch = 16, col = 'tomato')
polygon(c(xseq, rev(xseq), xseq), c(modelPrediction[, 'Q2.5'], rev(modelPrediction[, 'Q97.5']), modelPrediction[1, 'Q2.5']), border = NA, col = transparentCol) # notice negative values on posterior predictions
lines(xseq, modelPrediction[, 'Estimate'], lty = 2, lwd = 1.5, col = 'tomato')
usr  <-  par('usr')
polygon(c(usr[1:2], rev(usr[1:2]), usr), c(rep(usr, 2), rep(0, 2), usr), col = rgb(77, 77, 77, max = 255, alpha = 50))
lines(usr[1:2], c(0, 0), lty = 2, col = 'grey30')
text(mean(usr[1:2]), -0.1, 'Unrealistic posterior values', adj = c(0.5, 0.5), cex = 2)


postSamps  <-  brms::posterior_samples(model)
lKa        <-  postSamps[, 'b_lKa_Intercept']
lKe        <-  postSamps[, 'b_lKe_Intercept']
lCl        <-  postSamps[, 'b_lCl_Intercept']
postT_p    <-  (lKa - lKe) / (exp(lKa) - exp(lKe))
postP      <-  data$Dose_mg * exp(lKe + lKa - lCl) * (exp(-exp(lKe) * postT_p) - exp(-exp(lKa) * postT_p)) / (exp(lKa) - exp(lKe)) postE <- postP / (exp(lKa) * postT_p) hist(postE, xlab = 'Efficiency of drug absorption', xaxs = 'i', yaxs = 'i', las = 1, main = '', xpd = NA) usr <- par('usr') polygon(c(c(1, usr), rev(c(1, usr)), 1), c(rep(0, 2), rep(usr, 2), 0), col = rgb(77, 77, 77, max = 255, alpha = 50)) lines(c(1, usr), c(0, 0), lty = 2, col = 'grey30') text(mean(c(1, usr)), mean(usr[3:4]), 'Unrealistic efficiency values', adj = c(0.5, 0.5), cex = 2, srt = 90)  1 Like Hi, this is not really my expertise but since nobody else answered, I will give it a try. Overall I think you will need to stretch brms to its limits (if not beyond) for your task, while you are not getting much of brms's benefits, as you are not really using any linear model features. Consider if you do not want to implement the model directly in Stan - this would get you much more control over what is happening at the cost of increased learning curve. Almost all of the steps I describe below translate directly also to a possible pure Stan implementation. With that aside, here are some steps that should help you achieve what you need in brms: Note that you can set lower and upper bounds on parameters as a part of the prior specification (see help for set_prior) If I understand what you are doing correctly, the problem isn’t that mu itself is negative (in the plot it stays non-negative), but that mu + noise is negative (correct me, if I am wrong). The most common way to handle this is to model the logarithm of the value and then use lognormal or gamma response. Note that by default, if you set family = "gamma" the link defaults to link = log (there is caveat for family = Gamma(), which is NOT equivalent, see brmsfamily help), so if you just switched from guassian to gamma, this could explain why you had weird results. In any case, modelling the logarithm is probably a good idea. It could also improve numerical stability as you’ll get rid of most of the exp calls (check out log_diff_exp in Stan manual). While the reparametrization you propose seems sensible (on a surface level, I did not invest in a deeper understanding), you generally don’t want to add reject statements into Stan to constrain feasible configurations (it can break the sampler). Instead, you should declare your parameters in such a way, that every allowed combination of the parameters is feasible . For example, you could treat E, K_a and either P or T_p as the parameters you are fitting, set appropriate upper and lower bounds for them and compute K_e and C_l from them. Note that it may actually be easier to elicit good domain knowledge for P or T_p than for K_e and C_l, letting you set better informed priors (I can imagine a clinician has a rough idea about time to peak). A uniform prior over [0,1] on E could be just fine, but some expert knowledge may be available as well here. Alternatively P_{rel} = \frac{P}{D} might also be a good parameter if D varies a lot, and would let you have sensible priors once again - e.g. if peak cannot possibly be larger than dose, you could put an upper bound of 1 and put a P_{rel} \sim Beta(1,2) prior to state that you believe, peak is likely much lower than dose (not an expert, just guessing). Hope that helps. 1 Like Hi martinmodrak, Thanks very much for your answer. I have been able to implement the model on the log scale using a gamma distribution (although it took some fiddling with the priors and starting values) – this took care of the mu + noise < 0 problem. But it now gives a slightly worse estimate for mu, and increases considerably the posterior combinations of parameters that yield E > 1, although the mean value of E is sound (~11%; please see code and figures at the end). However, I have been trying to understand how to implement the efficiency suggestion you made. You were spot on about the expectations about P and E – it is straight forward to have decent expectations what those values should be, which would improve the prior specification. However, what I did not understand was how to “treat” these variables as “parameters”. They are a punctual by-product derived from the entire time series, so I am a bit confused as to what the input response variable on the code (currently Drug_concentration_mg.ml) would be in this case. Shouldn’t I need the entire time series to estimate not only K_a, but also K_e and C_l? I don’t follow how I could, for example, back calculate K_e and C_l from P and E – apologies if I’m being totally thick here. Thanks again for your help! Here’s the code reworked using the gamma distribution: library(brms) rstan::rstan_options(auto_write = TRUE) options(mc.cores = parallel::detectCores()) calculateMinimumKa <- function (data, minimumScaling = 1) { dt <- data$Time_days[which.max(data$Drug_concentration_mg.ml)] - data$Time_days
dy  <-  max(data$Drug_concentration_mg.ml) - data$Drug_concentration_mg.ml
ifelse(dy == 0, data$Drug_concentration_mg.ml, dy) / ifelse(dt == 0, data$Time_days, dt) * minimumScaling
}

'Drug_concentration_mg.ml,Dose_mg,Time_days
0.0006773291,13.57695,0.01
0.4894254798,13.57695,0.88
1.0282572998,13.57695,1.24
1.6008287008,13.57695,1.77
1.5079260525,13.57695,2.10
1.4089264576,13.57695,2.85
1.1498402861,13.57695,3.93
0.8579193906,13.57695,4.95
0.5450494898,13.57695,7.05
0.2563178348,13.57695,11.97
0.1634733689,13.57695,15.98
0.0706932426,13.57695,23.74
0.0774802343,13.57695,28.96
0.0364683985,13.57695,53.01',
header = TRUE, stringsAsFactors = FALSE)

data$lnDose_mg <- log(data$Dose_mg)

minKa  <-  calculateMinimumKa(data)
maxKa  <-  minKa * 3

kineticsPriors    <-  c(prior(normal(1, 2), nlpar = 'Ke', lb = 0),
eval(parse(text = paste0('prior(normal(', minKa * 1.5, ', 5), nlpar = \'Ka\', lb = ', minKa, ', ub = ', maxKa,')'))),
prior(normal(1, 2), nlpar = 'Cl', lb = 0))

kineticsInits    <-  list(list(Ke = 1.0, Ka = 0.02, Cl = 0.1),
list(Ke = 1.5, Ka = 0.4, Cl = 0.5),
list(Ke = 0.5, Ka = 5,  Cl = 1.5))

# define one compartment extravenous pharmacokinetic model
# notice that parameters are fitted on the log scale to force them to be positive
kineticsEquation  <-  brms::bf(Drug_concentration_mg.ml ~ lnDose_mg + log(Ke) + log(Ka) - log(Cl) + log(exp(-Ke * Time_days) - exp(-Ka * Time_days)) - log(Ka - Ke), Ka ~ 1, Ke ~ 1, Cl ~ 1, nl = TRUE)

set.seed(1)
brmData   <-  brms::make_standata(kineticsEquation, data = data, family = "gamma", prior = kineticsPriors, sample_prior = TRUE)
model  <-  brms::brm(kineticsEquation, data = data, family = "gamma", prior = kineticsPriors, sample_prior = TRUE, chains = 3, cores = 3, iter = 1.5e4, warmup = 7.5e3, inits = kineticsInits, control = list(adapt_delta = 0.9))

# add time to peak, and peak, then plot the data
pars         <-  as.data.frame(t(brms::fixef(model))['Estimate', , drop = FALSE])
names(pars)  <-  gsub('_Intercept', '', names(pars))

T_p  <-  (log(pars$Ka) - log(pars$Ke)) / (pars$Ka - pars$Ke) # time to peak in days
P    <-  data$Dose_mg * pars$Ke * pars$Ka / pars$Cl * (exp(-pars$Ke * T_p) - exp(-pars$Ka * T_p)) / (pars$Ka - pars$Ke) # peak in mg/ml
E    <-  P / (pars$Ka * T_p) xseq <- seq(min(data$Time_days), max(data$Time_days), length.out = 200) modelPrediction <- predict(model, newdata = data.frame(Time_days = xseq, lnDose_mg = unique(data$lnDose_mg)), re_formula = NA, summary = TRUE, probs = c(0.025, 0.975))[, c('Estimate', 'Q2.5', 'Q97.5')]
transparentCol   <-  rgb(255, 99, 71, max = 255, alpha = 125)
plot(data$Time_days, data$Drug_concentration_mg.ml, ylim = range(modelPrediction), xlab = 'Time (days)', ylab = 'Drug concentration (mg / ml)', las = 1, pch = 16, col = 'tomato')
polygon(c(xseq, rev(xseq), xseq), c(modelPrediction[, 'Q2.5'], rev(modelPrediction[, 'Q97.5']), modelPrediction[1, 'Q2.5']), border = NA, col = transparentCol) # notice negative values on posterior predictions
lines(xseq, modelPrediction[, 'Estimate'], lty = 2, lwd = 1.5, col = 'tomato')


postSamps  <-  brms::posterior_samples(model)
Ka         <-  postSamps[, 'b_Ka_Intercept']
Ke         <-  postSamps[, 'b_Ke_Intercept']
Cl         <-  postSamps[, 'b_Cl_Intercept']
postT_p    <-  (log(Ka) - log(Ke)) / (Ka - Ke)
postP      <-  (data$Dose_mg * Ke * Ka * (exp(-Ke * postT_p) - exp(-Ka * postT_p))) / (Cl * (Ka - Ke)) postE <- postP / (Ka * postT_p) hist(postE, xlab = 'Efficiency of drug absorption', xaxs = 'i', yaxs = 'i', las = 1, main = '', xpd = NA) usr <- par('usr') polygon(c(c(1, usr), rev(c(1, usr)), 1), c(rep(0, 2), rep(usr, 2), 0), col = rgb(77, 77, 77, max = 255, alpha = 50)) lines(c(1, usr), c(0, 0), lty = 2, col = 'grey30') text(mean(c(1, usr)), mean(usr[3:4]), 'Unrealistic efficiency values', adj = c(0.5, 0.5), cex = 2, srt = 90)  I see. I think this is possibly a place where brms obscures some of the stuff that is going on :-) Let’s say we want to parametrize in terms of E, K_a and P. Looking at the formulas you gave for T_p and P, you should be able to solve for K_e and C_l given E, P, K_a and D. I don’t want to do the math, but symbolically: K_e = f_1(E, K_a, P, D) C_l = f_2(E, K_a, P, D) I am not sure f_1 or f_2 have closed form, but I think it is likely they have it. Substituting those into your original formula for \mu you should then proceed to get \mu = f_3(E, K_a, P, D, t) the code would then look something like (not tested, just a sketch): kineticsPriors <- c( prior(beta(2,1), nlpar = 'E', lb = 0, ub = 1), prior(normal(1, 2), nlpar = 'P', lb = 0), eval(parse(text = paste0('prior(normal(', minKa * 1.5, ', 5), nlpar = \'Ka\', lb = ', minKa, ', ub = ', maxKa,')')))) #Expand f_3 to the actual formula you derive kineticsEquation <- brms::bf(Drug_concentration_mg.ml ~ log_f_3(E, Ka, P, Dose, Time_days), Ka ~ 1, E ~ 1, D ~ 1, nl = TRUE)  Does that make sense? Also note that log(exp(-Ke * Time_days) - exp(-Ka * Time_days)) can be replaced with log_diff_exp(-Ke * Time_days, -Ka * Time_days) for increased numerical stability (see 3.14 Composed Functions | Stan Functions Reference for more details). And one more thing: it is generally discouraged to use parameter boundaries the way you use them for K_a. If it represents absorption rate, it makes sense to have a hard bound of K_a > 0, as negative absorption makes no sense, but there is no physical or mathematical reason the absorption couldn’t be arbitrarily large or small. You have some idea how small or large it can get, but this is a soft bound. Your beliefs about soft bounds should instead be encoded in the prior distribution, ensuring the prior doesn’t put much mass on values outside the soft bounds. Here a log-normal prior might be sensible, as it easily incorporates prior knowledge about the order of magnitude (e.g. log(K_a) \sim N(1, 1.5) means that K_a most likely falls within e^{1 - 2 \times 1.5} \simeq 0.14 and e^{1+2 \times 1.5} \simeq 54.6) 1 Like And since I obviously don’t want to do what I am suppposed to do and instead procrastrinate on the forums, here’s one more idea that might be easier than finding f_1 and f_2: if E>1 is unphysical, there must be some necessary and sufficient constraints on K_e,K_a and C_l that ensure E \leq 1. You can try to find them on your own, but they should be generally known and mentioned along with the formulation of the original differential equation (I guess K_a > K_e might be one of those, but that’s just a wild guess). Incorporating those constraints into your model would then ensure E \leq 1. How to best incorporate the constraints depends on their exact nature and might require additional reparametrisation, but it likely will be a simple matter. Thanks again martinmodrak! Solving for K_e as a function of K_a, E and P is turning out to be very challenging unfortunately, going beyond my math skills. In this link I substituted K_e for x, K_a for y, and E for z (it’s not a random equation). I also spent some time trying to figure out the natural constraints on the relationship between original parameters that need to be met such that E < 1, but again this has been extremely challenging. I’ll keep working on it, and I’ll post an update in case I make some progress. Sorry…too much text for me for the moment… but from experience, make sure to include an absolute and a relative error term when fitting real data. In virtually all cases this what you need to avoid underestimating the maximal concentration (there are many more reasons to screw that up, but this one which is easy to fix). 2 Likes Is there any more background available online for the model? Googling one compartment extravenous gives exactly three results, one is this thread, the other two show a slightly different main solution (\mu = \frac{D F K_a (e^{-K_e T} - e^{-K_a T})}{C_l (K_a - K_e)} where F just scales D to the bioavailable fraction). Neither shows the formula for T_p or E. I did not find any of this online, it was just a rationale I came up with. I am no mathematician though, but I think (hope) that the reasoning below is correct?! T_p can be calculated by setting \frac{d\mu}{dT} = 0. Because K_a (mass / volume / time), K_e (mass / volume / time), and C_l (volume / volume / time) are all constants, and K_a = absorption rate, one can ask the question of how efficient an organism is at absorbing a drug up to the peak: P = \mu(T_p), thus E = \frac{P}{K_a T_p}, with both P and the product K_a T_p having the same concentration units. Please let me know if this makes sense! The model I am using are also called “one compartment first order absorption model”. I think this manual is a good reference for these pharmacokinetics models. The model I am using, for instance, is equivalent to equation 1.7 (page 10), noting that t_D = 0, and that V = C_l / K_e (bottom pf page 6). Given that K_a > K_e, the dynamics after the peak is dictated by K_e only. Hence it makes sense to estimate E between T_D (i.e. T_0) and T_p. But again please let me know if I’m getting things mixed-up here. A small thing, the reference you’ve provided, using your notation has: \mu(T) = \frac{D K_a (e^{-K_e T} - e^{-K_a T})}{C_l (K_a - K_e)} (in contrast to what you’ve given, K_e is missing from the numerator), but your formula for T_p matches the correct version (your formula for P does not). Wolfram alpha checks for the derivative and zero of the derivative. Anyway, I think the main problem is that E in the form you’ve given can easily be larger than 1. Note that the absolute magnitude of \mu and P depends on D/C_l. But T_p does not depend on either D or C_l so I can make E arbitrarily large by increasing D/C_l while keeping the other parameters intact. I am starting to see why solving from P and E to other parameters is hard :-). Even getting K_e from K_a and T_p requires the Lambert W (product log) function which is currently not in Stan. But using wolfram alpha to simplify the formula for P (in two steps: first and second - you better check me :-) ) gives us: P = \frac{D}{C_l} \frac{K_e}{K_a}^{K_e/(K_a - K_e)} This also requires Lambert W to solve for P, but it tells us that using \frac{K_e}{K_a} as a base parameter might get us most of the way. I see, thanks very much for this. The formulas you provided were almost correct I think, because K_e was missing in your \mu(T) formula. This is because in the pdf I sent before, V = C_l/K_e, which puts K_e back in the numerator. Would it be possible then to have “two models in one” in brms? Where I would set C \sim \frac{D K_e K_a (e^{-K_e T} - e^{-K_a T})}{C_l (K_a - K_e)} Together with P \sim \frac{D K_e}{C_l}\frac{K_e}{K_a}^{K_e / (K_a - K_e)} With C and P being informed from data. One of the issues at the moment is that the gamma GLM is not able to estimate P well. So maybe adding the second likelihood above would help constrain the parameter space? Hi, sorry I missed your reply here. I don’t understand what you mean by absolute and relative error terms. Could you please elaborate with an example? That might be the key to solve this problem. Thanks! A quick example (using truncated normal model for simplicity): library(tidyverse) rel_error = 0.2 abs_error = 0.1 tibble(x = seq(0, 10, length.out = 100)) %>% crossing(error_type = factor(c("relative","absolute","both"), levels = c("relative","absolute","both"))) %>% mutate(y = sin(x / 3) + exp(x / 10) - 1, error = case_when(error_type == "relative" ~ y * 1.96 * rel_error , error_type == "absolute" ~ 1.96 * abs_error, error_type == "both" ~ y * 1.96 * rel_error + 1.96 * abs_error ), lower = pmax(0, y - error), upper = y + error ) %>% ggplot(aes(x = x, y = y, ymin = lower, ymax = upper)) + geom_ribbon(fill = "lightblue") + geom_line() + facet_wrap(~error_type)  The ribbon is the error term computed as either absolute, relative, or a sum of both. When the error term is relative to the magnitude of the concentration, you underestimate error at low concentrations. When the term is absolute, you often get unrealistically low error for high concentrations, that’s why you need both. Note: since you work on the log scale, there is some math involved in translating those onto the sigma for the log normal and I am too lazy to work it out - so best of luck :-) on the log scale you can use sigma_logy = sqrt( sigma_prop^2 + (sigma_const/exp(logy)9 ^2 ) as I recall. This is something you can derive with the delta method, but the details are somewhere lost in my notes… OK, I think I’m almost there! I have been able to implement the code in stan with a gaussian distribution instead of gamma, with some key modifications: 1 – I have substituted D / C_l for a constant parameter Q, just for simplicity. 2 – In the transformed parameters block, I have included the efficiency E as a bounded parameter [0,1] based on Q, K_a, and K_e (assuming they’re all real): E = \frac{P}{K_a T_p} = \frac{Q (Ka - K_e) (K_a/K_e)^{K_a/(K_e - K_a)}}{log(K_a) - log(K_e)} 3 – I fitted K_e in its logit form, K_e = \frac{K_a}{1 + e^{-\textrm{logit}(K_e)}}, to ensure that K_a > K_e; 4 – The formula is now fitted using T_p informed from data, substituting K_a - K_e in the original formulation with K_a - K_e = \frac{log(K_a) - log(K_e)}{T_p}: \mu(T,T_p) = \frac{\frac{Q K_a^2}{1 + e^{-\textrm{logit}(K_e)}} (e^{-\frac{K_a T}{1 + e^{-\textrm{logit}(K_e)}} } - e^{-K_a T})}{\frac{log(K_a) - log(K_e)}{T_p}} The only thing missing now is the error handling; the errors towards the tails on the data are most definitely underestimated, as pointed out by @martinmodrak – please see figure at the end. I am no longer fitting the function on the log scale, so I’m assuming it’s simpler to implement the absolute + relative error? I would really appreciate your help here implementing the relative error in the stan code below. It confuses me a bit because sigma here is real, and I do not understand how I can make operations in stan between a vector (which would be the case of Y relative to \mu) and a real. Thanks so much for all your guidance! stan code: data { int<lower=1> N; // number of observations vector[N] Y; // concentration int<lower=1> K_Ka; // number of population-level effects matrix[N, K_Ka] X_Ka; // population-level design matrix int<lower=1> K_logitKe; // number of population-level effects matrix[N, K_logitKe] X_logitKe; // population-level design matrix int<lower=1> K_Q; // number of population-level effects matrix[N, K_Q] X_Q; // population-level design matrix vector[N] Time_days; // Time vector[N] Time_peak_days; // Time to peak } parameters { vector<lower=0>[K_Ka] Ka; // uptake rate vector[K_logitKe] logitKe; // elimination rate (logit); Ke = (Ka / (1 + exp(-logitKe))); ensures that Ka > Ke vector<lower=0>[K_Q] Q; // constant (originally D / Cl) real<lower=0> sigma; // residual SD } transformed parameters { vector<lower=0,upper=1> Eff; // efficiency Eff = Q * (Ka - (Ka / (1 + exp(-logitKe)))) * (Ka / (Ka / (1 + exp(-logitKe))))^(Ka / ((Ka / (1 + exp(-logitKe))) - Ka)) / log(Ka / (Ka / (1 + exp(-logitKe)))); // constrain E mathematically based on Q, logitKe, and Ka } model { vector[N] nlp_Ka = X_Ka * Ka; vector[N] nlp_logitKe = X_logitKe * logitKe; vector[N] nlp_Q = X_Q * Q; vector[N] mu; for (n in 1:N) { mu[n] = nlp_Q[n] * nlp_Ka[n] * (nlp_Ka[n] / (1 + exp(-nlp_logitKe[n]))) * (exp(-(nlp_Ka[n] / (1 + exp(-nlp_logitKe[n]))) * Time_days[n]) - exp(-nlp_Ka[n] * Time_days[n])) / (log(nlp_Ka[n] / (nlp_Ka[n] / (1 + exp(-nlp_logitKe[n])))) / Time_peak_days[n]); } // priors target += normal_lpdf(Ka | 0, 10) - 1 * normal_lccdf(0 | 0, 10); target += normal_lpdf(logitKe | 0, 10) - 1 * normal_lccdf(0 | 0, 10); target += normal_lpdf(Q | 0, 10) - 1 * normal_lccdf(0 | 0, 10); target += normal_lpdf(Eff | 0.5, 0.5) - 1 * log_diff_exp(normal_lcdf(1 | 0.5, 0.5), normal_lcdf(0 | 0.5, 0.5)); target += student_t_lpdf(sigma | 3, 0, 10) - 1 * student_t_lccdf(0 | 3, 0, 10); // likelihood target += normal_lpdf(Y | mu, sigma); }  R code: library(rstan) rstan::rstan_options(auto_write = TRUE) options(mc.cores = parallel::detectCores()) data <- read.csv(text = 'Drug_concentration_mg.ml,Dose_mg,Time_days 0.0006773291,13.57695,0.01 0.4894254798,13.57695,0.88 1.0282572998,13.57695,1.24 1.6008287008,13.57695,1.77 1.5079260525,13.57695,2.10 1.4089264576,13.57695,2.85 1.1498402861,13.57695,3.93 0.8579193906,13.57695,4.95 0.5450494898,13.57695,7.05 0.2563178348,13.57695,11.97 0.1634733689,13.57695,15.98 0.0706932426,13.57695,23.74 0.0774802343,13.57695,28.96 0.0364683985,13.57695,53.01', header = TRUE, stringsAsFactors = FALSE) createParMat <- function (data) { mat <- matrix(rep(1, nrow(data))) colnames(mat) <- 'Intercept' attr(mat, 'assign') <- 0 mat } brmData <- list('N' = nrow(data), 'Y' = data$Drug_concentration_mg.ml,
'Time_days' = data$Time_days, 'Time_peak_days' = rep(data$Time_days[which.max(data$Drug_concentration_mg.ml)], nrow(data)), 'K_Ka' = 1, 'X_Ka' = createParMat(data), 'K_logitKe' = 1, 'X_logitKe' = createParMat(data), 'K_Q' = 1, 'X_Q' = createParMat(data)) set.seed(10) model <- rstan::stan('tmp_stancode_gaussian_logitKe_Tp_E.stan', data = brmData, chains = 3, cores = 3, iter = 1.5e4, warmup = 7.5e3, control = list(adapt_delta = 0.99, max_treedepth = 15)) # add time to peak, and peak, then plot the data pars <- as.data.frame(t(summary(model, pars = c('Ka', 'logitKe', 'Q'))$summary[, 'mean', drop = FALSE]))
names(pars)  <-  gsub('', '', names(pars), fixed = TRUE)
pars$Ke <- pars$Ka / (1 + exp(-pars$logitKe)) T_p <- log(pars$Ka / pars$Ke) / (pars$Ka - pars$Ke) # time to peak in days P <- pars$Q * pars$Ka * pars$Ke * (exp(-pars$Ke * T_p) - exp(-pars$Ka * T_p)) / (pars$Ka - pars$Ke) # peak in mg/ml
E    <-  P / (pars$Ka * T_p) xseq <- seq(min(data$Time_days), max(data$Time_days), length.out = 200) posts <- extract(model) predictions <- matrix(0, nrow(posts[['Ka']]), length(xseq)) for (i in 1:nrow(predictions)) { Ke <- posts[['Ka']][i] / (1 + exp(-posts[['logitKe']][i])) predictions[i, ] <- posts[['Q']][i] * Ke * posts[['Ka']][i] * (exp(-Ke * xseq) - exp(-posts[['Ka']][i] * xseq)) / (posts[['Ka']][i] - Ke) } quantiles <- apply(predictions, 2, quantile, probs = c(0.025, 0.975)) modelPrediction <- data.frame('mean' = apply(predictions, 2, mean), 'Q2.5' = quantiles['2.5%', ], 'Q97.5' = quantiles['97.5%', ]) transparentCol <- rgb(255, 99, 71, max = 255, alpha = 125) plot(data$Time_days, data$Drug_concentration_mg.ml, ylim = range(c(data$Drug_concentration_mg.ml, unlist(modelPrediction))), xlab = 'Time (days)', ylab = 'Drug concentration (mg / ml)', las = 1, pch = 16, col = 'tomato')
polygon(c(xseq, rev(xseq), xseq), c(modelPrediction$Q2.5, rev(modelPrediction$Q97.5), modelPrediction$Q2.5), border = NA, col = transparentCol) # notice negative values on posterior predictions lines(xseq, modelPrediction$mean, lty = 2, lwd = 1.5, col = 'tomato')


Cool that you decided to move to Stan proper! Hope you are not overwhelmed.

Some minor things to improve the code, if they don’t make sense to you, please ask:

Is that supposed to bound Ka > 0? If so than, this is not necessary, Stan does some automagic stuff to properly handle bounds on parameters, so if you just have vector<lower=0>[K_Ka] Ka; and then Ka ~ normal(0, 10); it will treat Ka as having half-normal distribution, no other adjustments necessary.

You also generally don’t put prior on transformed parameters (Eff in your case). Prior on the parameters implicitly defines prior on transformed parameters. If you want to put prior on Eff, the best way is to reparametrize (move Eff to parameters and compute some other variable in transformed parameters). Alternatively (and this is slightly advanced), you can do Jacobian adjustment - the manual should have you covered if you want to do that.

Finally, a good prior for variables constrained between 0 and 1 is beta which is likely a better choice than truncated normal.

The keyword to search for is “vectorization” as in 3.1 Vectorization of Real-Valued Functions | Stan Functions Reference Stan supports operations between vectors and reals in a way similar to normal mathematical notation, so you can write new_vector = a_number * a_vector or new_vector = a_vector + different_vector. If you want element-wise multiplication between two vectors you can do new_vector = a_vector .* different_vector. (normally * would be matrix product, so you would write new_number = a_row_vector * a_column_vector)

For example you can replace

 for (n in 1:N) {
mu[n] = nlp_Q[n] * nlp_Ka[n] * (nlp_Ka[n] / (1 + exp(-nlp_logitKe[n]))) * (exp(-(nlp_Ka[n] / (1 + exp(-nlp_logitKe[n]))) * Time_days[n]) - exp(-nlp_Ka[n] * Time_days[n])) / (log(nlp_Ka[n] / (nlp_Ka[n] / (1 + exp(-nlp_logitKe[n])))) / Time_peak_days[n]);
}


with (hope I didn’t make any typos)

 mu = nlp_Q .* nlp_Ka .* (nlp_Ka ./ (1 + exp(-nlp_logitKe))) .* (exp(-(nlp_Ka ./ (1 + exp(-nlp_logitKe))) .* Time_days) - exp(-nlp_Ka .* Time_days)) ./ (log(nlp_Ka ./ (nlp_Ka ./ (1 + exp(-nlp_logitKe)))) ./ Time_peak_days);


Finally I have implemented relative + absolute error with a truncated normal error in one of my models: genexpi-stan/regulated.stan at master · cas-bioinf/genexpi-stan · GitHub (check lines 201 - 205). Note that T is a truncation operator, see manual for further details. You could also write your own truncation with the normal_lcdf call as you did for the priors. Note that unlike for parameters, Stan will not do anything automagical for data with bounds, you need to do that yourself.

Hope that helps.

1 Like

Thanks @martinmodrak

First, thanks for all the coding tips. I have been able to simplify the code and quicken the run time substantially.

Second, for some reason implementing the relative + absolute error actually yields worse estimates for P and T_p. This is assuming that I coded this correctly, please see code and output below – I would just appreciate one final look to make sure I didn’t get this wrong:

data {
int<lower=1> N;  // number of observations
vector[N] Y;  // concentration
vector[N] Time_days; // Time
vector[N] Time_peak_days; // Time to peak
}
parameters {
real<lower=0> Ka;  // uptake rate
real logitKe;  // elimination rate (logit); Ke = (Ka / (1 + exp(-logitKe))); ensures that Ka > Ke
real<lower=0> Q;  // constant (originally D / Cl)
real<lower=0> sigma_absolute;
real<lower=0> sigma_relative;  // residual SD
}
transformed parameters {
real<lower=0,upper=1> Eff;  // efficiency
Eff = Q * (Ka - (Ka / (1 + exp(-logitKe)))) * (Ka / (Ka / (1 + exp(-logitKe))))^(Ka / ((Ka / (1 + exp(-logitKe))) - Ka)) / log(Ka / (Ka / (1 + exp(-logitKe)))); // constrain E mathematically based on Q, logitKe, and Ka
}
model {
vector[N] mu;
mu = (Q * Ka * (Ka / (1 + exp(-logitKe)))) * (exp(-(Ka / (1 + exp(-logitKe))) * Time_days) - exp(-Ka * Time_days)) ./ (log(Ka / (Ka / (1 + exp(-logitKe)))) ./ Time_peak_days);

// likelihood
for(n in 1:N) {
real sigma = sigma_absolute + sigma_relative * mu[n];
Y[n] ~ normal(mu[n], sigma) T[0,];
}

// priors
Ka ~ normal(0, 10);
logitKe ~ normal(0, 10);
Q ~ normal(0, 10);
sigma_absolute ~ student_t(3, 0, 10);
sigma_relative ~ student_t(3, 0, 10);
}


So, I think I’ll stick with the simplified code without the relative error (see code and output below):

data {
int<lower=1> N;  // number of observations
vector[N] Y;  // concentration
vector[N] Time_days; // Time
vector[N] Time_peak_days; // Time to peak
}
parameters {
real<lower=0> Ka;  // uptake rate
real logitKe;  // elimination rate (logit); Ke = (Ka / (1 + exp(-logitKe))); ensures that Ka > Ke
real<lower=0> Q;  // constant (originally D / Cl)
real<lower=0> sigma;  // residual SD
}
transformed parameters {
real<lower=0,upper=1> Eff;  // efficiency
Eff = Q * (Ka - (Ka / (1 + exp(-logitKe)))) * (Ka / (Ka / (1 + exp(-logitKe))))^(Ka / ((Ka / (1 + exp(-logitKe))) - Ka)) / log(Ka / (Ka / (1 + exp(-logitKe)))); // constrain E mathematically based on Q, logitKe, and Ka
}
model {
vector[N] mu;
mu = (Q * Ka * (Ka / (1 + exp(-logitKe)))) * (exp(-(Ka / (1 + exp(-logitKe))) * Time_days) - exp(-Ka * Time_days)) ./ (log(Ka / (Ka / (1 + exp(-logitKe)))) ./ Time_peak_days);
// priors
Ka ~ normal(0, 10);
logitKe ~ normal(0, 10);
Q ~ normal(0, 10);
sigma ~ student_t(3, 0, 10);
// likelihood
Y ~ normal(mu, sigma);
}


Once again, thanks for your patience and kind help @martinmodrak and @wds15 !