# Sigmoid Emax reparameterisation

I am interested to know if there is a more suitable parameterisation of the following implementation. I am modelling a dose response using a (sigmoid) emax with the response being binomial (proportion of successes) as follows:

data {
int<lower=0> K;  // number of arms
int n[K];       // n on each arm
int y[K];       // responders on each arm
vector[K] d;
vector[4] mu;
vector[4] sig;
}
transformed data{
}
parameters {
real b1; // lb
real b2; // ub
real<lower=0> b3;            // ed50
real b4;                     // slope steepness hill par
}
transformed parameters{
vector[K] eta;
for(i in 1:K){
eta[i] = b1 + b2 * pow(d[i], b4) * inv(pow(b3, b4) + pow(d[i], b4));
}
}
model {
target += normal_lpdf(b1 | mu[1], sig[1])
+ normal_lpdf(b2 | mu[2], sig[2])
+ normal_lpdf(b3 | mu[3], sig[3]) - normal_lccdf(0 | mu[3], sig[4])
+ normal_lpdf(b4 | mu[4], sig[4])
+ binomial_logit_lpmf(y | n, eta);
}
generated quantities{
vector[K] theta;
theta = inv_logit(eta);
}


Notwithstanding the fact that the b1 and b2 will be moved onto log odds scale and b4 loses some of its interpretation, it runs ok and seems to recover the parameters acceptably.

sigemax <- function(x, b1, b2, b3, b4){
theta <- b1 + (b2 * x^b4)/(x^b4 + b3^b4)
return(theta)
}

dose_seq <-  c(0.000, 0.125, 0.250, 0.500, 1.000)
K <- length(dose_seq)
p_true <- sigemax(dose_seq, b1 = 0.2, b2 = 0.3, b3 = 0.275, b4 = 10)
plot(dose_seq, p_true, ylim = c(0, 0.6), type = "l")
y <- round(p_true * 1000/K)
n <- rep(1000/K, K)
prior <- list(mu = c(0.2, 0.3, 0.5, 5),
sig = c(2, 2, 1, 3))
dat <- list(K = K,
y = y,
n = n,
d = dose_seq,
mu = prior$mu, sig = prior$sig)

fit1 <- rstan::sampling(mod1,
data = dat,
chains = 1,thin = 1,iter = 4000,refresh = 0,warmup = 1000,
control = list(adapt_delta = 0.9))


Fit with the theta values corresponding to the p_true values that underlie the data.

           mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
b1        -1.37    0.00 0.13  -1.64  -1.45  -1.37  -1.28  -1.13  1216    1
b2         1.42    0.00 0.18   1.09   1.29   1.41   1.53   1.79  1307    1
b3         0.30    0.00 0.04   0.24   0.27   0.29   0.32   0.40  1154    1
b4         5.84    0.06 2.08   2.50   4.34   5.59   7.03  10.77  1281    1
eta[1]    -1.37    0.00 0.13  -1.64  -1.45  -1.37  -1.28  -1.13  1216    1
eta[2]    -1.34    0.00 0.12  -1.58  -1.42  -1.34  -1.25  -1.11  1456    1
eta[3]    -0.94    0.00 0.15  -1.24  -1.04  -0.94  -0.83  -0.64  2106    1
eta[4]    -0.07    0.00 0.11  -0.30  -0.14  -0.07   0.00   0.14  2508    1
eta[5]     0.04    0.00 0.11  -0.18  -0.04   0.03   0.11   0.27  2243    1
theta[1]   0.20    0.00 0.02   0.16   0.19   0.20   0.22   0.24  1207    1
theta[2]   0.21    0.00 0.02   0.17   0.19   0.21   0.22   0.25  1412    1
theta[3]   0.28    0.00 0.03   0.23   0.26   0.28   0.30   0.35  2100    1
theta[4]   0.48    0.00 0.03   0.43   0.47   0.48   0.50   0.54  2512    1
theta[5]   0.51    0.00 0.03   0.45   0.49   0.51   0.53   0.57  2249    1
lp__     -23.20    0.05 1.45 -26.97 -23.91 -22.84 -22.15 -21.45   905    1


True probability of response on each dose.

p_true
[1] 0.2000000 0.2001129 0.2834784 0.4992420 0.4999993


However, under the following conditions, stan reports divergent transitions

dose_seq <- c(0, 50, 100, 200, 400)
K <- length(dose_seq)
p_true <- sigemax(dose_seq, b1 = 0.2, b2 = 0.3, b3 = 150, b4 = 5)
plot(dose_seq, p_true, ylim = c(0, 0.6), type = "l")
y <- round(p_true * 1000/K)
n <- rep(1000/K, K)
prior <- list(mu = c(0.2, 0.3, 200, 5),
sig = c(2, 2, 50, 7))
dat <- list(K = K,
y = y,
n = n,
d = dose_seq,
mu = prior$mu, sig = prior$sig)

fit2 <- rstan::sampling(mod1,
data = dat,
chains = 1,thin = 1,iter = 4000,refresh = 0,warmup = 1000,
control = list(adapt_delta = 0.9))

           mean se_mean    sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
b1        -1.34    0.00  0.11  -1.57  -1.41  -1.34  -1.27  -1.14  1106    1
b2         1.36    0.01  0.20   1.01   1.23   1.34   1.48   1.82   893    1
b3       161.17    0.71 21.69 116.74 145.66 163.29 177.21 198.73   930    1
b4         8.54    0.17  4.55   2.23   4.85   7.82  11.50  18.68   708    1
eta[1]    -1.34    0.00  0.11  -1.57  -1.41  -1.34  -1.27  -1.14  1106    1
eta[2]    -1.33    0.00  0.10  -1.54  -1.40  -1.33  -1.26  -1.13  1286    1
eta[3]    -1.23    0.00  0.13  -1.46  -1.32  -1.24  -1.15  -0.94  1538    1
eta[4]    -0.27    0.00  0.14  -0.55  -0.36  -0.26  -0.16  -0.01  1911    1
eta[5]     0.00    0.00  0.13  -0.26  -0.09   0.00   0.09   0.26  1534    1
theta[1]   0.21    0.00  0.02   0.17   0.20   0.21   0.22   0.24  1113    1
theta[2]   0.21    0.00  0.02   0.18   0.20   0.21   0.22   0.24  1292    1
theta[3]   0.23    0.00  0.02   0.19   0.21   0.22   0.24   0.28  1533    1
theta[4]   0.43    0.00  0.03   0.37   0.41   0.44   0.46   0.50  1912    1
theta[5]   0.50    0.00  0.03   0.44   0.48   0.50   0.52   0.57  1535    1
lp__     -22.54    0.04  1.33 -25.73 -23.23 -22.29 -21.53 -20.80  1113    1


While the warning can be addressed via increasing adapt_delta to 0.99, I am left wondering if there is a better modelling approach here? By better, I mean more robust. This is important because I do not have much insight into the true shape of the curve and therefore in practice cannot be as directive with the priors as I have been here.

My sense is that part of the problem relates to the fact that in this latter model, the parameters are on quite different scales and looking at the pairs plots suggests this might be worth looking into more. An obvious fix is to prescale the dose so that the sampling geometry is a little more friendly but then I think my parameter estimates will be mush with no way to transform them back to a useful space. Additionally, perhaps, adopting a beta likelihood under a means and sample size parameterisation might be more appropriate here? This would necessitate (0, 1) constraints on b1 and b2 but it would give direct access to the parameter estimates on their original scale, but obviously wonâ€™t change the fact that the parameters will still be on very different scales. Or do I simply need more data points? Or am a hand wringing needlessly?

Again, I am simply interested in whether anyone can suggest a better (more robust) way regardless of whether that is a reparameterisation or a change in the structure of the priors or something else.

1 Like

The issue is that b4 shows bimodality. It either wants to be positive or negative. If you restrict the range to positive or negative it works well.

Iâ€™ve reparameterized for a bit better numerical precision and speed but really it comes down to restricting the hill slope to either negative or positive. Go ahead and try it.

y = E_0 + E_{max} \nu

where \nu = (1 + \tau)^{-1} and \tau = \exp(-h \log(d) - \log(\text{ED}_{50})

data {
int<lower=0> K;  // number of arms
int n[K];       // n on each arm
int y[K];       // responders on each arm
vector[K] d;
vector[4] mu;
vector[4] sig;
}
parameters {
real b1; // lb
real b2; // ub
real<lower=0> b3;       // ed50
real<upper=0> b4;       // slope steepness hill par
// or
// real<lower=0> b4;       // slope steepness hill par
}
transformed parameters{
vector[K - 1] tau = -b4 * (log(d[2:K]) - log(b3));
vector[K - 1] nu = -log1p_exp(tau);
vector[K] eta = b1 + b2 * append_row(0, exp(nu));
}
model {
target += normal_lpdf(b1 | mu[1], sig[1])
+ normal_lpdf(b2 | mu[2], sig[2])
+ normal_lpdf(b3 | mu[3], sig[3]) - normal_lccdf(0 | mu[3], sig[4])
+ normal_lpdf(b4 | mu[4], sig[4])
+ binomial_logit_lpmf(y | n, eta);
}
generated quantities{
vector[K] theta;
theta = inv_logit(eta);
}


Thanks very much @spinkney. The sampling seems much quicker. I understand the rationale for the not using the first dose in log(d[2:K]) - log(b3), and the subsequent append_row(0, exp(nu)), however, when I fit the following data the b1 estimate doesnâ€™t align with the underlying data generating parameters. In the code below spinkey1.stan contains the code you provided and in spinkey2.stan I modify the transformed parameters block as follows:

transformed parameters{
vector[K] tau = -b4 * (log1p(d[1:K]) - log(b3));
vector[K] nu = -log1p_exp(tau);
vector[K] eta = b1 + b2 * exp(nu);
}


the log1p possibly isnâ€™t ideal, but no other alternative is coming to mind at the moment.

mod1 <- rstan::stan_model(file = "inst/stan/spinkey1.stan", auto_write = TRUE)
mod2 <- rstan::stan_model(file = "inst/stan/spinkey2.stan", auto_write = TRUE)
sigemax <- function(x, b1, b2, b3, b4){
theta <- b1 + (b2 * x^b4)/(x^b4 + b3^b4)
return(theta)
}
dose_seq <- c(0, 50, 100, 200, 400)
p_true <- sigemax(dose_seq, b1 = 0.2, b2 = 0.3, b3 = 200, b4 = 7)
y <- as.integer(round(p_true * 200))
n <- rep(200, 5)
dat <- list(K = 5,
y = y,
n = n,
d = dose_seq,
mu = c(0, 0, 200, 5),
sig = c(2, 2, 50, 5)
)
#
fit1 <- rstan::sampling(mod1,
data = dat,
chains = 1,thin = 1,iter = 3000,refresh = 0,warmup = 1000,
control = list(adapt_delta = 0.9))
fit2 <- rstan::sampling(mod2,
data = dat,
chains = 1,thin = 1,iter = 3000,refresh = 0,warmup = 1000,
control = list(adapt_delta = 0.9))

post1 <- rstan::extract(fit1, pars = "theta")$theta post2 <- rstan::extract(fit2, pars = "theta")$theta
post <- list(post1, post2)


and plotting theta obtained from the results with the data plotted on top:

par(mfrow = c(1, 2))
par(las=1, bty="l", cex.axis=1, cex.lab=1, cex.main=1,
xaxs="r", yaxs="r", mar = c(5, 5, 1, 5))

for(j in 1:2){
# Predicted response by arm from first and last look
theta <- post[[j]]
plot(1, type="n", main = "", xlab="Dollars", ylab="Pr(evt)",
xlim=range(dose_seq), ylim=c(0, 1), axes = F)
axis(1, at = dose_seq, cex.axis=0.7)
axis(2, at = pretty(c(0, 1)), cex.axis=0.7)

reps <- 50
idx <- sample(1:nrow(theta), size = reps, replace = F)
lc <- colormap::colormap(colormap=simtools::nom_colors, nshades=reps)
for(i in 1:reps){
lines(dose_seq, theta[idx[i],], col = lc[i])
}
points(dose_seq, dat$y/dat$n, lwd = 2)
}
par(mfrow = c(1, 1))


How did you determine that b4 is bimodal?

spinkey1.stan (959 Bytes) spinkey2.stan (941 Bytes)

1 Like

Youâ€™re right, b1 doesnâ€™t work as originally written. You need this instead assuming the first dose == 0. You need the ternary operator because

\begin{align} y &= b_1 + b_2 \bigg(1 + \frac{1}{d^{h}\text{ED}_{50}}\bigg)^{-1} \\ \text{if $h < 0$ and $d \to 0$ then } & \\ &= b_1 + b_2 \bigg(1 + \frac{0}{\text{ED}_{50}} \bigg)^{-1} \text{ and} \\ &= b_1 + b_2 \\ \text{otherwise, if} \; h > 0 \text{ and } & \\ \lim_{d \to 0} \bigg(1 + \frac{1}{d^{h}\text{ED}_{50}}\bigg)^{-1} &\to 0 \text{ then } \\ y &= b_1 \end{align}

Lastly, if h = 0 then y = b_1 + b_2 \bigg(1 + \frac{1}{\text{ED}_{50}}\bigg)^{-1} which I havenâ€™t handled in the code so assuming h \ne 0,

transformed parameters{
vector[K - 1] tau = -b4 * (log(d[2:K]) - log(b3));
vector[K - 1] nu = -log1p_exp(tau);
// if b4
vector[K] eta = b4 > 0 ? append_row(b1, b1 + b2 * exp(nu)) : append_row(b1 + b2, b1 + b2 * exp(nu));
}


And now I get (I couldnâ€™t find the simtools package so colors differ)

I was examining the pairs plot. When I set real<upper=0> b4; and examine the pairs plot for all the bâ€™s, pairs(fit1, pars = c("b1", "b2", "b3", "b4"))

However, when I remove that restriction, I can get nice bimodality. You may have to run a couple of times as it doesnâ€™t happen every time. See

2 Likes

Thanks again @spinkney, that looks better. I have marked as solution.

1 Like