# Bounding posterior output of pharmacokinetic model

Thanks again martinmodrak!

Solving for K_e as a function of K_a, E and P is turning out to be very challenging unfortunately, going beyond my math skills. In this link I substituted K_e for x, K_a for y, and E for z (it’s not a random equation).

I also spent some time trying to figure out the natural constraints on the relationship between original parameters that need to be met such that E < 1, but again this has been extremely challenging. I’ll keep working on it, and I’ll post an update in case I make some progress.

Sorry…too much text for me for the moment… but from experience, make sure to include an absolute and a relative error term when fitting real data. In virtually all cases this what you need to avoid underestimating the maximal concentration (there are many more reasons to screw that up, but this one which is easy to fix).

2 Likes

Is there any more background available online for the model? Googling one compartment extravenous gives exactly three results, one is this thread, the other two show a slightly different main solution (\mu = \frac{D F K_a (e^{-K_e T} - e^{-K_a T})}{C_l (K_a - K_e)} where F just scales D to the bioavailable fraction). Neither shows the formula for T_p or E.

I did not find any of this online, it was just a rationale I came up with. I am no mathematician though, but I think (hope) that the reasoning below is correct?!

T_p can be calculated by setting \frac{d\mu}{dT} = 0.

Because K_a (mass / volume / time), K_e (mass / volume / time), and C_l (volume / volume / time) are all constants, and K_a = absorption rate, one can ask the question of how efficient an organism is at absorbing a drug up to the peak:

P = \mu(T_p), thus

E = \frac{P}{K_a T_p}, with both P and the product K_a T_p having the same concentration units.

Please let me know if this makes sense!

The model I am using are also called “one compartment first order absorption model”. I think this manual is a good reference for these pharmacokinetics models. The model I am using, for instance, is equivalent to equation 1.7 (page 10), noting that t_D = 0, and that V = C_l / K_e (bottom pf page 6).

Given that K_a > K_e, the dynamics after the peak is dictated by K_e only. Hence it makes sense to estimate E between T_D (i.e. T_0) and T_p. But again please let me know if I’m getting things mixed-up here.

A small thing, the reference you’ve provided, using your notation has:

\mu(T) = \frac{D K_a (e^{-K_e T} - e^{-K_a T})}{C_l (K_a - K_e)}

(in contrast to what you’ve given, K_e is missing from the numerator), but your formula for T_p matches the correct version (your formula for P does not).

Wolfram alpha checks for the derivative and zero of the derivative.

Anyway, I think the main problem is that E in the form you’ve given can easily be larger than 1. Note that the absolute magnitude of \mu and P depends on D/C_l. But T_p does not depend on either D or C_l so I can make E arbitrarily large by increasing D/C_l while keeping the other parameters intact.

I am starting to see why solving from P and E to other parameters is hard :-). Even getting K_e from K_a and T_p requires the Lambert W (product log) function which is currently not in Stan.

But using wolfram alpha to simplify the formula for P (in two steps: first and second - you better check me :-) ) gives us:

P = \frac{D}{C_l} \frac{K_e}{K_a}^{K_e/(K_a - K_e)}

This also requires Lambert W to solve for P, but it tells us that using \frac{K_e}{K_a} as a base parameter might get us most of the way.

I see, thanks very much for this.

The formulas you provided were almost correct I think, because K_e was missing in your \mu(T) formula. This is because in the pdf I sent before, V = C_l/K_e, which puts K_e back in the numerator.

Would it be possible then to have “two models in one” in brms?
Where I would set

C \sim \frac{D K_e K_a (e^{-K_e T} - e^{-K_a T})}{C_l (K_a - K_e)}

Together with

P \sim \frac{D K_e}{C_l}\frac{K_e}{K_a}^{K_e / (K_a - K_e)}

With C and P being informed from data. One of the issues at the moment is that the gamma GLM is not able to estimate P well. So maybe adding the second likelihood above would help constrain the parameter space?

Hi, sorry I missed your reply here. I don’t understand what you mean by absolute and relative error terms. Could you please elaborate with an example? That might be the key to solve this problem. Thanks!

A quick example (using truncated normal model for simplicity):

library(tidyverse)

rel_error = 0.2
abs_error = 0.1

tibble(x = seq(0, 10, length.out = 100)) %>%
crossing(error_type = factor(c("relative","absolute","both"), levels = c("relative","absolute","both"))) %>%
mutate(y = sin(x / 3) + exp(x / 10) - 1,
error = case_when(error_type == "relative" ~ y * 1.96 * rel_error ,
error_type == "absolute" ~ 1.96 * abs_error,
error_type == "both" ~ y * 1.96 * rel_error + 1.96 * abs_error
),
lower = pmax(0, y - error),
upper = y + error
) %>%
ggplot(aes(x = x, y = y, ymin = lower, ymax = upper)) + geom_ribbon(fill = "lightblue") + geom_line() +
facet_wrap(~error_type)



The ribbon is the error term computed as either absolute, relative, or a sum of both.

When the error term is relative to the magnitude of the concentration, you underestimate error at low concentrations. When the term is absolute, you often get unrealistically low error for high concentrations, that’s why you need both.

Note: since you work on the log scale, there is some math involved in translating those onto the sigma for the log normal and I am too lazy to work it out - so best of luck :-)

on the log scale you can use

sigma_logy = sqrt( sigma_prop^2 + (sigma_const/exp(logy)9 ^2 )

as I recall. This is something you can derive with the delta method, but the details are somewhere lost in my notes…

OK, I think I’m almost there! I have been able to implement the code in stan with a gaussian distribution instead of gamma, with some key modifications:

1 – I have substituted D / C_l for a constant parameter Q, just for simplicity.

2 – In the transformed parameters block, I have included the efficiency E as a bounded parameter [0,1] based on Q, K_a, and K_e (assuming they’re all real):

E = \frac{P}{K_a T_p} = \frac{Q (Ka - K_e) (K_a/K_e)^{K_a/(K_e - K_a)}}{log(K_a) - log(K_e)}

3 – I fitted K_e in its logit form, K_e = \frac{K_a}{1 + e^{-\textrm{logit}(K_e)}}, to ensure that K_a > K_e;

4 – The formula is now fitted using T_p informed from data, substituting K_a - K_e in the original formulation with K_a - K_e = \frac{log(K_a) - log(K_e)}{T_p}:

\mu(T,T_p) = \frac{\frac{Q K_a^2}{1 + e^{-\textrm{logit}(K_e)}} (e^{-\frac{K_a T}{1 + e^{-\textrm{logit}(K_e)}} } - e^{-K_a T})}{\frac{log(K_a) - log(K_e)}{T_p}}

The only thing missing now is the error handling; the errors towards the tails on the data are most definitely underestimated, as pointed out by @martinmodrak – please see figure at the end. I am no longer fitting the function on the log scale, so I’m assuming it’s simpler to implement the absolute + relative error? I would really appreciate your help here implementing the relative error in the stan code below. It confuses me a bit because sigma here is real, and I do not understand how I can make operations in stan between a vector (which would be the case of Y relative to \mu) and a real. Thanks so much for all your guidance!

stan code:

data {
int<lower=1> N;  // number of observations
vector[N] Y;  // concentration
int<lower=1> K_Ka;  // number of population-level effects
matrix[N, K_Ka] X_Ka;  // population-level design matrix
int<lower=1> K_logitKe;  // number of population-level effects
matrix[N, K_logitKe] X_logitKe;  // population-level design matrix
int<lower=1> K_Q;  // number of population-level effects
matrix[N, K_Q] X_Q;  // population-level design matrix
vector[N] Time_days; // Time
vector[N] Time_peak_days; // Time to peak
}
parameters {
vector<lower=0>[K_Ka] Ka;  // uptake rate
vector[K_logitKe] logitKe;  // elimination rate (logit); Ke = (Ka / (1 + exp(-logitKe))); ensures that Ka > Ke
vector<lower=0>[K_Q] Q;  // constant (originally D / Cl)
real<lower=0> sigma;  // residual SD
}
transformed parameters {
vector<lower=0,upper=1>[1] Eff;  // efficiency
Eff[1] = Q[1] * (Ka[1] - (Ka[1] / (1 + exp(-logitKe[1])))) * (Ka[1] / (Ka[1] / (1 + exp(-logitKe[1]))))^(Ka[1] / ((Ka[1] / (1 + exp(-logitKe[1]))) - Ka[1])) / log(Ka[1] / (Ka[1] / (1 + exp(-logitKe[1])))); // constrain E mathematically based on Q, logitKe, and Ka
}
model {
vector[N] nlp_Ka = X_Ka * Ka;
vector[N] nlp_logitKe = X_logitKe * logitKe;
vector[N] nlp_Q = X_Q * Q;
vector[N] mu;
for (n in 1:N) {
mu[n] = nlp_Q[n] * nlp_Ka[n] * (nlp_Ka[n] / (1 + exp(-nlp_logitKe[n]))) * (exp(-(nlp_Ka[n] / (1 + exp(-nlp_logitKe[n]))) * Time_days[n]) - exp(-nlp_Ka[n] * Time_days[n])) / (log(nlp_Ka[n] / (nlp_Ka[n] / (1 + exp(-nlp_logitKe[n])))) / Time_peak_days[n]);
}
// priors
target += normal_lpdf(Ka | 0, 10)
- 1 * normal_lccdf(0 | 0, 10);
target += normal_lpdf(logitKe | 0, 10)
- 1 * normal_lccdf(0 | 0, 10);
target += normal_lpdf(Q | 0, 10)
- 1 * normal_lccdf(0 | 0, 10);
target += normal_lpdf(Eff | 0.5, 0.5)
- 1 * log_diff_exp(normal_lcdf(1 | 0.5, 0.5), normal_lcdf(0 | 0.5, 0.5));
target += student_t_lpdf(sigma | 3, 0, 10)
- 1 * student_t_lccdf(0 | 3, 0, 10);
// likelihood
target += normal_lpdf(Y | mu, sigma);
}


R code:

library(rstan)
rstan::rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

'Drug_concentration_mg.ml,Dose_mg,Time_days
0.0006773291,13.57695,0.01
0.4894254798,13.57695,0.88
1.0282572998,13.57695,1.24
1.6008287008,13.57695,1.77
1.5079260525,13.57695,2.10
1.4089264576,13.57695,2.85
1.1498402861,13.57695,3.93
0.8579193906,13.57695,4.95
0.5450494898,13.57695,7.05
0.2563178348,13.57695,11.97
0.1634733689,13.57695,15.98
0.0706932426,13.57695,23.74
0.0774802343,13.57695,28.96
0.0364683985,13.57695,53.01',
header = TRUE, stringsAsFactors = FALSE)

createParMat  <-  function (data) {
mat  <-  matrix(rep(1, nrow(data)))
colnames(mat)  <-  'Intercept'
attr(mat, 'assign')  <-  0
mat
}

brmData  <-  list('N' = nrow(data),
'Y' = data$Drug_concentration_mg.ml, 'Time_days' = data$Time_days,
'Time_peak_days' = rep(data$Time_days[which.max(data$Drug_concentration_mg.ml)], nrow(data)),
'K_Ka' = 1,
'X_Ka' = createParMat(data),
'K_logitKe' = 1,
'X_logitKe' = createParMat(data),
'K_Q' = 1,
'X_Q' = createParMat(data))

set.seed(10)
model  <-  rstan::stan('tmp_stancode_gaussian_logitKe_Tp_E.stan', data = brmData, chains = 3, cores = 3, iter = 1.5e4, warmup = 7.5e3, control = list(adapt_delta = 0.99, max_treedepth = 15))

# add time to peak, and peak, then plot the data
pars         <-  as.data.frame(t(summary(model, pars = c('Ka[1]', 'logitKe[1]', 'Q[1]'))$summary[, 'mean', drop = FALSE])) names(pars) <- gsub('[1]', '', names(pars), fixed = TRUE) pars$Ke      <-  pars$Ka / (1 + exp(-pars$logitKe))

T_p  <-  log(pars$Ka / pars$Ke) / (pars$Ka - pars$Ke) # time to peak in days
P    <-  pars$Q * pars$Ka * pars$Ke * (exp(-pars$Ke * T_p) - exp(-pars$Ka * T_p)) / (pars$Ka - pars$Ke) # peak in mg/ml E <- P / (pars$Ka * T_p)

xseq         <-  seq(min(data$Time_days), max(data$Time_days), length.out = 200)
posts        <-  extract(model)
predictions  <-  matrix(0, nrow(posts[['Ka']]), length(xseq))
for (i in 1:nrow(predictions)) {
Ke                <-  posts[['Ka']][i] / (1 + exp(-posts[['logitKe']][i]))
predictions[i, ]  <-  posts[['Q']][i] * Ke * posts[['Ka']][i] * (exp(-Ke * xseq) - exp(-posts[['Ka']][i] * xseq)) / (posts[['Ka']][i] - Ke)
}
quantiles        <-  apply(predictions, 2, quantile, probs = c(0.025, 0.975))
modelPrediction  <-  data.frame('mean' = apply(predictions, 2, mean), 'Q2.5' = quantiles['2.5%', ], 'Q97.5' = quantiles['97.5%', ])
transparentCol   <-  rgb(255, 99, 71, max = 255, alpha = 125)

plot(data$Time_days, data$Drug_concentration_mg.ml, ylim = range(c(data$Drug_concentration_mg.ml, unlist(modelPrediction))), xlab = 'Time (days)', ylab = 'Drug concentration (mg / ml)', las = 1, pch = 16, col = 'tomato') polygon(c(xseq, rev(xseq), xseq[1]), c(modelPrediction$Q2.5, rev(modelPrediction$Q97.5), modelPrediction$Q2.5[1]), border = NA, col = transparentCol) # notice negative values on posterior predictions
lines(xseq, modelPrediction\$mean, lty = 2, lwd = 1.5, col = 'tomato')


Cool that you decided to move to Stan proper! Hope you are not overwhelmed.

Some minor things to improve the code, if they don’t make sense to you, please ask:

Is that supposed to bound Ka > 0? If so than, this is not necessary, Stan does some automagic stuff to properly handle bounds on parameters, so if you just have vector<lower=0>[K_Ka] Ka; and then Ka ~ normal(0, 10); it will treat Ka as having half-normal distribution, no other adjustments necessary.

You also generally don’t put prior on transformed parameters (Eff in your case). Prior on the parameters implicitly defines prior on transformed parameters. If you want to put prior on Eff, the best way is to reparametrize (move Eff to parameters and compute some other variable in transformed parameters). Alternatively (and this is slightly advanced), you can do Jacobian adjustment - the manual should have you covered if you want to do that.

Finally, a good prior for variables constrained between 0 and 1 is beta which is likely a better choice than truncated normal.

The keyword to search for is “vectorization” as in 3.1 Vectorization of Real-Valued Functions | Stan Functions Reference Stan supports operations between vectors and reals in a way similar to normal mathematical notation, so you can write new_vector = a_number * a_vector or new_vector = a_vector + different_vector. If you want element-wise multiplication between two vectors you can do new_vector = a_vector .* different_vector. (normally * would be matrix product, so you would write new_number = a_row_vector * a_column_vector)

For example you can replace

 for (n in 1:N) {
mu[n] = nlp_Q[n] * nlp_Ka[n] * (nlp_Ka[n] / (1 + exp(-nlp_logitKe[n]))) * (exp(-(nlp_Ka[n] / (1 + exp(-nlp_logitKe[n]))) * Time_days[n]) - exp(-nlp_Ka[n] * Time_days[n])) / (log(nlp_Ka[n] / (nlp_Ka[n] / (1 + exp(-nlp_logitKe[n])))) / Time_peak_days[n]);
}


with (hope I didn’t make any typos)

 mu = nlp_Q .* nlp_Ka .* (nlp_Ka ./ (1 + exp(-nlp_logitKe))) .* (exp(-(nlp_Ka ./ (1 + exp(-nlp_logitKe))) .* Time_days) - exp(-nlp_Ka .* Time_days)) ./ (log(nlp_Ka ./ (nlp_Ka ./ (1 + exp(-nlp_logitKe)))) ./ Time_peak_days);


Finally I have implemented relative + absolute error with a truncated normal error in one of my models: genexpi-stan/regulated.stan at master · cas-bioinf/genexpi-stan · GitHub (check lines 201 - 205). Note that T is a truncation operator, see manual for further details. You could also write your own truncation with the normal_lcdf call as you did for the priors. Note that unlike for parameters, Stan will not do anything automagical for data with bounds, you need to do that yourself.

Hope that helps.

1 Like

Thanks @martinmodrak

First, thanks for all the coding tips. I have been able to simplify the code and quicken the run time substantially.

Second, for some reason implementing the relative + absolute error actually yields worse estimates for P and T_p. This is assuming that I coded this correctly, please see code and output below – I would just appreciate one final look to make sure I didn’t get this wrong:

data {
int<lower=1> N;  // number of observations
vector[N] Y;  // concentration
vector[N] Time_days; // Time
vector[N] Time_peak_days; // Time to peak
}
parameters {
real<lower=0> Ka;  // uptake rate
real logitKe;  // elimination rate (logit); Ke = (Ka / (1 + exp(-logitKe))); ensures that Ka > Ke
real<lower=0> Q;  // constant (originally D / Cl)
real<lower=0> sigma_absolute;
real<lower=0> sigma_relative;  // residual SD
}
transformed parameters {
real<lower=0,upper=1> Eff;  // efficiency
Eff = Q * (Ka - (Ka / (1 + exp(-logitKe)))) * (Ka / (Ka / (1 + exp(-logitKe))))^(Ka / ((Ka / (1 + exp(-logitKe))) - Ka)) / log(Ka / (Ka / (1 + exp(-logitKe)))); // constrain E mathematically based on Q, logitKe, and Ka
}
model {
vector[N] mu;
mu = (Q * Ka * (Ka / (1 + exp(-logitKe)))) * (exp(-(Ka / (1 + exp(-logitKe))) * Time_days) - exp(-Ka * Time_days)) ./ (log(Ka / (Ka / (1 + exp(-logitKe)))) ./ Time_peak_days);

// likelihood
for(n in 1:N) {
real sigma = sigma_absolute + sigma_relative * mu[n];
Y[n] ~ normal(mu[n], sigma) T[0,];
}

// priors
Ka ~ normal(0, 10);
logitKe ~ normal(0, 10);
Q ~ normal(0, 10);
sigma_absolute ~ student_t(3, 0, 10);
sigma_relative ~ student_t(3, 0, 10);
}


So, I think I’ll stick with the simplified code without the relative error (see code and output below):

data {
int<lower=1> N;  // number of observations
vector[N] Y;  // concentration
vector[N] Time_days; // Time
vector[N] Time_peak_days; // Time to peak
}
parameters {
real<lower=0> Ka;  // uptake rate
real logitKe;  // elimination rate (logit); Ke = (Ka / (1 + exp(-logitKe))); ensures that Ka > Ke
real<lower=0> Q;  // constant (originally D / Cl)
real<lower=0> sigma;  // residual SD
}
transformed parameters {
real<lower=0,upper=1> Eff;  // efficiency
Eff = Q * (Ka - (Ka / (1 + exp(-logitKe)))) * (Ka / (Ka / (1 + exp(-logitKe))))^(Ka / ((Ka / (1 + exp(-logitKe))) - Ka)) / log(Ka / (Ka / (1 + exp(-logitKe)))); // constrain E mathematically based on Q, logitKe, and Ka
}
model {
vector[N] mu;
mu = (Q * Ka * (Ka / (1 + exp(-logitKe)))) * (exp(-(Ka / (1 + exp(-logitKe))) * Time_days) - exp(-Ka * Time_days)) ./ (log(Ka / (Ka / (1 + exp(-logitKe)))) ./ Time_peak_days);
// priors
Ka ~ normal(0, 10);
logitKe ~ normal(0, 10);
Q ~ normal(0, 10);
sigma ~ student_t(3, 0, 10);
// likelihood
Y ~ normal(mu, sigma);
}


Once again, thanks for your patience and kind help @martinmodrak and @wds15 !

I would add up variances and not standard deviations…

1 Like

@wds15 You are obviously correct, it totally should have been real sigma = sqrt(square(sigma_absolute) + square(sigma_relative * mu[n])). This was based on my advice so thanks for noting and sorry for the confusion - assuming my older code is definitely OK and giving it as example is probably not a good policy going forward :-). (and I definitely hope @dbarneche checks what I propose, I definitely make a lot of mistakes).

Other than that looks roughly good, some more minor things I noticed:

Those are very wide priors, especially for sigma_relative, I think you can safely assume that a priori the relative error is not more than 1 (i.e. the error is not 100%), giving you something like sigma_relative ~ normal(0, 0.5); or even tighter. Similarly, knowing the error of your measuring device could give you better prior for the sigma_absolute.

This is also a bit weird - the constraint will only make the sampler throw errors at you when it is invalidated, and break the sampler if non-negligible posterior mass is outside the constraint. Are you sure Eff cannot physically be larger than 1? (I’ve discussed that in a previous post here, but not sure if you changed the model or you’ve seen a mistake in my reasoning).

1 Like

Thanks again @martinmodrak and @wds15! Even after adding the corrections above (i.e. adding variances rather than standard deviations, tighter priors on sigma), the peak is still slightly underestimated when compared to the model without an absolute error term only.

The crux of the problem is the efficiency equation. I came up with this efficiency reasoning myself borrowing some concepts from ecological theory (my field of research), so while I’d say it makes lots of sense to me, I admit I could be missing something.

If we start with concentration \mu = 0 at time T_0, and the dynamics are dictated by this single dose, first order absorption model, the only parameter in the equation that conceptually drives the drug absorption (as far as I understand!) is K_a. K_e and C_l are parameters related to elimination processes.

So, if the conceptual interpretation above is correct, K_a T sets the maximum possible concentration at any time point (even in the hypothetical absence of elimination processes). So, whatever the concentration value at the peak P is (or any concentration at any time point), it has to be lower than K_a T, even if mathematically higher values of Q (originally D/C_l) allow for E > 1.

Please let me know if you disagree. I’m not a pharmacologist, so I am going to reach out to more experienced folks to confirm this reasoning is correct!

I thought so for a while, but that is NOT correct:

library(tidyverse)
D = 10
Ka = 1
Cl = 1
Ke = 0.1
tibble(t = seq(0,10,length.out = 100)) %>%
mutate(mu = D * Ka * (exp(-Ke * t) - exp(-Ka * t)) / (Cl * (Ka - Ke))) %>%
ggplot(aes(t, mu)) + geom_line() + geom_abline(slope = Ka, color = "blue", linetype = "dashed")


The dashed line is K_a T, the curve is \mu(T)

1 Like

Thanks for this. As mentioned before, I appreciate that mathematically you can get cases where E > 1. In your plot above, for instance, you only see that because D was very high relative to C_l. But if you were to set D = 1 for example, you’d get E < 1 over the entire range of t.

I wonder two things:

1 - My proposed equation is correct, and all I am missing are the defined natural constraints of this equation that would set the values of E to always be [0,1]. I haven’t been able to find some in-depth discussion on the web about what all these parameters mean, and how (or if) they constrain one another.

2 - If there’s another efficiency equation for E that explicitly limits its value to be [0,1] regardless of the parameter space. In my mind, even if the E formula I proposed is wrong, the question is still valid, i.e. what was the efficiency of the organism in absorbing the drug up to the peak? Efficiencies by definition are never > 1. I’m not a mathematician though, so I wouldn’t know if there’s a method that allows me to derive such an equation that is independent of the parameter space. If you have ideas though, I’d love to hear them!

Thanks again!

Turns out I was using wrong formula (withoute K_e in the numerator). With the correct one, you can make E arbitrarily large even for fixed D/C_l, say:

D = 1
Ka = 1
Cl = 0.5
Ke = 10

T_p = (log(Ka) - log(Ke)) / (Ka - Ke)
P_rel = Ke * Ka * (exp(-Ke*T_p) - exp(-Ka*T_p)) / (Ka - Ke)
P = P_rel * D / Cl

tibble(t = seq(0,1,length.out = 100)) %>%
mutate(mu = D *Ke * Ka * (exp(-Ke * t) - exp(-Ka * t)) / (Cl * (Ka - Ke))) %>%
ggplot(aes(t, mu)) + geom_line() + geom_abline(slope = Ka, color = "blue", linetype = "dashed")


And note that when modelling you generally shouldn’t put hard constraints based on domain knowledge (e.g that D < C_l), this should rather be represented as a soft constraint in prior. Hard constraints generally belong to unphysical/mathematically impossible situations, which E > 1 isn’t.