The following is intended to model a monotonically increasing binomial response. I am purposely not assuming any parametric form for the profile of the response. There is a horseshoe prior on q[2:N]
to shrink small values to zero. The x
are equally spaced and the zero lower bound b
are summed to give a non-decreasing eta
. I am most likely stuck with the small number of unique x
values.
data{
int<lower=1> N;
int y[N];
int n[N];
vector[N] x;
}
transformed data{
matrix[N,N] S;
for(i in 1:N){
for(j in 1:N){
if(i>=j) S[i,j]=1;
else S[i,j]=0;
}
}
}
parameters{
real v0;
vector<lower=0>[N-1] b;
vector<lower=0>[N-1] lam;
real<lower=0> tau;
real<lower=0> c2;
}
transformed parameters{
vector[N] eta;
vector[N] q;
vector[N-1] lamhat;
for(i in 1:(N-1)){
lamhat[i] = c2 * lam[i]/sqrt(c2 + pow(tau, 2) * pow(lam[i], 2));
}
q[1] = v0;
q[2:N] = b * tau .* lamhat;
eta = S * q;
}
model{
y ~ binomial_logit(n, eta);
v0 ~ normal(0, 5);
b ~ normal(0, 1);
lam ~ cauchy(0, 1);
c2 ~ inv_gamma(2, 4);
tau ~ cauchy(0, 0.25);
}
generated quantities{
vector[N] theta;
theta = inv_logit(eta);
}
Using the following (totally arbitrary) data, the model samples ok and the parameter values appear to be mostly sensible (or at least what I would expect from this data). However, the predicted values for \theta_j = \text{inv_logit} (\eta_j) seem not to capture the flat portions of the curve well. Additionally, I do not have a lot of experience with these types of priors so I am not certain that it is correctly implemented.
y1 <- c(50,50,50,70,70,70)
x1 <- seq_along(y1)-1
n1 <- rep(100, length(x1))
dat <- list(N = length(x1), y = y1, n = n1, x = x1)
mod1 <- rstan::stan_model(file = "inst/stan/mono1.stan", auto_write = TRUE);
fit1 <- rstan::sampling(mod1,
data = dat,
chains = 1,thin = 1,iter = 6000,refresh = 1000,warmup = 1000,
control = list(adapt_delta = 0.97, max_treedepth = 12))
post <- rstan::extract(fit1)
pred <- sapply(1:dat$N, function(j){
c(mu = mean(post$theta[,j]), quantile(post$theta[, j], probs = c(0.025, 0.975)))
})
par(las=1, bty="l", cex.axis=1, cex.lab=1, cex.main=1,
xaxs="i", yaxs="i", mar = c(5, 5, 3, 5))
plot(x1, y1/n1, type = "l", ylim = c(0.3, 0.9))
lines(x1, pred["mu", ], col = 2)
lines(x1, pred["2.5%", ], lty = 2, col = 2)
lines(x1, pred["97.5%", ], lty = 2, col = 2)
points(x1, pred["mu", ], col = 2)
With increasing sample size, e.g. 300 for each arm (see below), the model dose a better job of recovering the true values, but it is still not ideal.
I am wondering if anyone can suggest a better alternative or perhaps sees an error in the implementation? I have had a look into monotonic splines, but so far I haven’t found anything that works any better.