I am interested to know if there is a more suitable parameterisation of the following implementation. I am modelling a dose response using a (sigmoid) emax with the response being binomial (proportion of successes) as follows:
data {
int<lower=0> K; // number of arms
int n[K]; // n on each arm
int y[K]; // responders on each arm
vector[K] d;
vector[4] mu;
vector[4] sig;
}
transformed data{
}
parameters {
real b1; // lb
real b2; // ub
real<lower=0> b3; // ed50
real b4; // slope steepness hill par
}
transformed parameters{
vector[K] eta;
for(i in 1:K){
eta[i] = b1 + b2 * pow(d[i], b4) * inv(pow(b3, b4) + pow(d[i], b4));
}
}
model {
target += normal_lpdf(b1 | mu[1], sig[1])
+ normal_lpdf(b2 | mu[2], sig[2])
+ normal_lpdf(b3 | mu[3], sig[3]) - normal_lccdf(0 | mu[3], sig[4])
+ normal_lpdf(b4 | mu[4], sig[4])
+ binomial_logit_lpmf(y | n, eta);
}
generated quantities{
vector[K] theta;
theta = inv_logit(eta);
}
Notwithstanding the fact that the b1 and b2 will be moved onto log odds scale and b4 loses some of its interpretation, it runs ok and seems to recover the parameters acceptably.
sigemax <- function(x, b1, b2, b3, b4){
theta <- b1 + (b2 * x^b4)/(x^b4 + b3^b4)
return(theta)
}
dose_seq <- c(0.000, 0.125, 0.250, 0.500, 1.000)
K <- length(dose_seq)
p_true <- sigemax(dose_seq, b1 = 0.2, b2 = 0.3, b3 = 0.275, b4 = 10)
plot(dose_seq, p_true, ylim = c(0, 0.6), type = "l")
y <- round(p_true * 1000/K)
n <- rep(1000/K, K)
prior <- list(mu = c(0.2, 0.3, 0.5, 5),
sig = c(2, 2, 1, 3))
dat <- list(K = K,
y = y,
n = n,
d = dose_seq,
mu = prior$mu, sig = prior$sig)
fit1 <- rstan::sampling(mod1,
data = dat,
chains = 1,thin = 1,iter = 4000,refresh = 0,warmup = 1000,
control = list(adapt_delta = 0.9))
Fit with the theta values corresponding to the p_true values that underlie the data.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
b1 -1.37 0.00 0.13 -1.64 -1.45 -1.37 -1.28 -1.13 1216 1
b2 1.42 0.00 0.18 1.09 1.29 1.41 1.53 1.79 1307 1
b3 0.30 0.00 0.04 0.24 0.27 0.29 0.32 0.40 1154 1
b4 5.84 0.06 2.08 2.50 4.34 5.59 7.03 10.77 1281 1
eta[1] -1.37 0.00 0.13 -1.64 -1.45 -1.37 -1.28 -1.13 1216 1
eta[2] -1.34 0.00 0.12 -1.58 -1.42 -1.34 -1.25 -1.11 1456 1
eta[3] -0.94 0.00 0.15 -1.24 -1.04 -0.94 -0.83 -0.64 2106 1
eta[4] -0.07 0.00 0.11 -0.30 -0.14 -0.07 0.00 0.14 2508 1
eta[5] 0.04 0.00 0.11 -0.18 -0.04 0.03 0.11 0.27 2243 1
theta[1] 0.20 0.00 0.02 0.16 0.19 0.20 0.22 0.24 1207 1
theta[2] 0.21 0.00 0.02 0.17 0.19 0.21 0.22 0.25 1412 1
theta[3] 0.28 0.00 0.03 0.23 0.26 0.28 0.30 0.35 2100 1
theta[4] 0.48 0.00 0.03 0.43 0.47 0.48 0.50 0.54 2512 1
theta[5] 0.51 0.00 0.03 0.45 0.49 0.51 0.53 0.57 2249 1
lp__ -23.20 0.05 1.45 -26.97 -23.91 -22.84 -22.15 -21.45 905 1
True probability of response on each dose.
p_true
[1] 0.2000000 0.2001129 0.2834784 0.4992420 0.4999993
However, under the following conditions, stan reports divergent transitions
dose_seq <- c(0, 50, 100, 200, 400)
K <- length(dose_seq)
p_true <- sigemax(dose_seq, b1 = 0.2, b2 = 0.3, b3 = 150, b4 = 5)
plot(dose_seq, p_true, ylim = c(0, 0.6), type = "l")
y <- round(p_true * 1000/K)
n <- rep(1000/K, K)
prior <- list(mu = c(0.2, 0.3, 200, 5),
sig = c(2, 2, 50, 7))
dat <- list(K = K,
y = y,
n = n,
d = dose_seq,
mu = prior$mu, sig = prior$sig)
fit2 <- rstan::sampling(mod1,
data = dat,
chains = 1,thin = 1,iter = 4000,refresh = 0,warmup = 1000,
control = list(adapt_delta = 0.9))
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
b1 -1.34 0.00 0.11 -1.57 -1.41 -1.34 -1.27 -1.14 1106 1
b2 1.36 0.01 0.20 1.01 1.23 1.34 1.48 1.82 893 1
b3 161.17 0.71 21.69 116.74 145.66 163.29 177.21 198.73 930 1
b4 8.54 0.17 4.55 2.23 4.85 7.82 11.50 18.68 708 1
eta[1] -1.34 0.00 0.11 -1.57 -1.41 -1.34 -1.27 -1.14 1106 1
eta[2] -1.33 0.00 0.10 -1.54 -1.40 -1.33 -1.26 -1.13 1286 1
eta[3] -1.23 0.00 0.13 -1.46 -1.32 -1.24 -1.15 -0.94 1538 1
eta[4] -0.27 0.00 0.14 -0.55 -0.36 -0.26 -0.16 -0.01 1911 1
eta[5] 0.00 0.00 0.13 -0.26 -0.09 0.00 0.09 0.26 1534 1
theta[1] 0.21 0.00 0.02 0.17 0.20 0.21 0.22 0.24 1113 1
theta[2] 0.21 0.00 0.02 0.18 0.20 0.21 0.22 0.24 1292 1
theta[3] 0.23 0.00 0.02 0.19 0.21 0.22 0.24 0.28 1533 1
theta[4] 0.43 0.00 0.03 0.37 0.41 0.44 0.46 0.50 1912 1
theta[5] 0.50 0.00 0.03 0.44 0.48 0.50 0.52 0.57 1535 1
lp__ -22.54 0.04 1.33 -25.73 -23.23 -22.29 -21.53 -20.80 1113 1
While the warning can be addressed via increasing adapt_delta
to 0.99, I am left wondering if there is a better modelling approach here? By better, I mean more robust. This is important because I do not have much insight into the true shape of the curve and therefore in practice cannot be as directive with the priors as I have been here.
My sense is that part of the problem relates to the fact that in this latter model, the parameters are on quite different scales and looking at the pairs plots suggests this might be worth looking into more. An obvious fix is to prescale the dose so that the sampling geometry is a little more friendly but then I think my parameter estimates will be mush with no way to transform them back to a useful space. Additionally, perhaps, adopting a beta likelihood under a means and sample size parameterisation might be more appropriate here? This would necessitate (0, 1) constraints on b1 and b2 but it would give direct access to the parameter estimates on their original scale, but obviously won’t change the fact that the parameters will still be on very different scales. Or do I simply need more data points? Or am a hand wringing needlessly?
Again, I am simply interested in whether anyone can suggest a better (more robust) way regardless of whether that is a reparameterisation or a change in the structure of the priors or something else.