I am fitting a dose-response model, or exposure-response model, where the response is bacterial density (called CATcfu
in the model, density being the continuous approximation of counts based on kinetic/mass action logic) as a function of added antibiotic AMP0.
The nonlinear regression equation I have derived and won’t explain here, but basically it involves the three parameters K
(maximum bacterial density), d
(death rate due to antibiotic), I_A
(median inhibitory dose of the antibiotic), which I am fitting, and also a hyperparameter sigma which represents the (lognormal) experimental error.
I have additional parameters that I know a priori and those go into the data block (those are the initial bacterial density y0
, the duration of antibiotic exposure t
and the bacteria’s growth rate gr
.
My experimental data have lower limits of detection, i.e. below a certain bacterial density I cannot reliably tell how many there are. These limits of detection change based on the bacteria I study, and as a consequence, different bacteria (with different growth rates) have different limits of detection.
Despite these complexities and nonlinearities the model fits the data OK but fails with some parameters as I explain below. I included a reprex below in R.
I think my main problem is putting priors on the imputed data. Basically, when I try to place the priors
y_cens ~ lognormal(mu_cens, sigma);
The model suddenly terribly diverges and nothing works. However, if I comment out that line, the fit performs much better.
One parameter that never converges even when the line is commented out is the lognormal standard deviation sigma
. I think this is happening since the likelihood is estimated for the uncensored and censored data using the same sigma, but there are no sensible priors and bounds on the censored data.
I’d appreciate hints on what I am doing wrong and also, as always, if there is anything else you seee that’s a glaring mistake in my method.
Below is the stan code and reprex in R (using tidyverse)
//bacteria_dose_response.stan
functions{ // within-chain parallelization to speed up sampling
real partial_sum(array[] real y_slice,
int start, int end,
vector muz, real sigma){
return lognormal_lpdf(y_slice | muz[start:end],sigma);
}
}
data{
int<lower=1> N; // total number of data points
int<lower=0> N_cens; //number of censored (below detection limit) data points
array[N_cens] real<lower=0> censor_points; //detection limit
array[N] real<lower=0> CAT; //observed CAT CFU/mL
vector<lower=0>[N] gr; // growth rates
vector<lower=0>[N] AMP0; // antibiotic concentration
vector<lower=0>[N_cens] gr_cens; // growth rates
vector<lower=0>[N_cens] AMP0_cens; // antibiotic concentration
real y0; //initial density of bacteria
real time; //antibiotic exposure time
int grainsize; //for reduce sum
}
parameters{
real<lower=y0> K; //maximum density
real<lower=0> I_A; //real antibiotic IC50 i.e. median inhibitory concentration
real<lower=0> d; //real death rate
real<lower=0> sigma; //lognormal standard deviation
array[N_cens] real<lower=0> CAT_cens; //imputed missing data
}
transformed parameters{
//the ab_term is part of the nonlinear formula. To see it in an untransformed way, you can check out the reprex
vector[N] ab_term = ((AMP0*d - I_A*gr)*time)./(AMP0+I_A) + log(K-y0); //antibiotic nonlinear term, part of the nonlinear formula
vector[N_cens] ab_term_cens = ((AMP0_cens*d - I_A*gr_cens)*time)./(AMP0_cens+I_A) + log(K-y0); //antibiotic nonlinear term for censored datapoints
//defining expected values for all data inc. censored data
vector[N] mu;
vector[N_cens] mu_cens;
for(i in 1:N){
mu[i] = log(y0) + log(K) - log_sum_exp(ab_term[i],log(y0)); //nonlinear formula in a log transform
}
for(i in 1:N_cens){
mu_cens[i] = log(y0) + log(K) - log_sum_exp(ab_term_cens[i],log(y0));
}
}
model{
K ~ lognormal(log(1e9),1);
d ~ normal(0,0.15);
I_A ~ normal(0.5,0.5);
sigma ~ exponential(1);
sigma_cens ~ exponential(1);
//If I uncomment the line below, the model no longer converges!
//CAT_cens ~ lognormal(mu_cens,sigma);
target+= reduce_sum(partial_sum,CAT,grainsize,mu,sigma);
target+= reduce_sum(partial_sum,CAT_cens,grainsize,mu_cens,sigma);
}
generated quantities{
array[N+N_cens] real l_pred = append_array(lognormal_rng(mu,sigma),lognormal_rng(mu_cens,sigma)); //for pp_check
}
Reprex
library(cmdstanr);library(tidyverse)
bacteria_dose_response = cmdstan_model('bacteria_dose_response.stan')
y0=1e7;K_actual=1.8e9;d=0.05;I_A=0.33;time=22; # true parameters of simulation
set.seed(300)
sim_dat = expand_grid(gr=c(0.3,0.5,0.6), ## gr is growth rate
AMP0=c(0,(50/1.5^14)*1.5^(0:14)), ## AMP0 is antibiotic concentration
rep=1:5# rep is experimental replicate
) %>%
mutate(
CATcfu = y0*K_actual/(y0 + (K_actual-y0)*exp(((AMP0*d - I_A*gr)*time)/(AMP0+I_A))), ## calculating true bacterial density based on nonlinear formula
DL = recode(gr, `0.3` = 4e6, `0.5` = 1e7,`0.6`=1.7e7), # detection limits
CATcfu_cens=ifelse(CATcfu<=DL,DL,CATcfu), # censoring data below detection limit
CATcfu_werr = rlnorm(n(),meanlog=log(CATcfu_cens),sdlog=1), ## adding experimental noise
cens = ifelse(CATcfu_werr < DL,1,0)) %>% as.data.frame ## adding censoring column
## visualization of data, panels are the different bacteria (with different growth rates and different detection limits
sim_dat %>% ggplot(aes(AMP0,CATcfu_werr)) + geom_point(aes(alpha=as.factor(cens)))+
scale_alpha_manual(values = c(1,0.2))+
scale_x_continuous(trans=amp_breaks,breaks=round(ampconcs[1:7*2-1],2)) +
scale_y_log10()+
facet_grid(~gr) + theme_classic()
## dataset in list format for Stan
sim_list = list(
N= nrow(sim_dat[sim_dat$cens==0,]),
N_cens= nrow(sim_dat[sim_dat$cens==1,]),
censor_points=sim_dat[sim_dat$cens==1,'DL'],
#N_G = length(unique(cat_dat$Genotype)),
#ID = cat_dat$Genotype,
CAT = sim_dat[sim_dat$cens==0,'CATcfu'],
gr = sim_dat[sim_dat$cens==0,'gr'],
AMP0= sim_dat[sim_dat$cens==0,'AMP0'],
gr_cens = sim_dat[sim_dat$cens==1,'gr'],
AMP0_cens= sim_dat[sim_dat$cens==1,'AMP0'],
y0 = y0, # this is known
time = 22, # this is known
grainsize=1
)
## initial values, not necessary
init_sim = function() list(K = 1e9, I_A = 0.3, d = 0.21 sigma = 1)
# fit
fit_sim = bacteria_dose_response$sample(
data = sim_list,
init = init_sim,
iter_warmup = 2000,iter_sampling = 2000
)
# note sigma and lp__ don't get properly sampled at all, but other parameters are better
`