Modelling monotonic binomial response

The following is intended to model a monotonically increasing binomial response. I am purposely not assuming any parametric form for the profile of the response. There is a horseshoe prior on q[2:N] to shrink small values to zero. The x are equally spaced and the zero lower bound b are summed to give a non-decreasing eta. I am most likely stuck with the small number of unique x values.

  int<lower=1> N;
  int y[N];          
  int n[N];    
  vector[N] x; 
transformed data{
  matrix[N,N] S;     
  for(i in 1:N){    
    for(j in 1:N){  
      if(i>=j) S[i,j]=1;
      else S[i,j]=0;        
  real v0;
  vector<lower=0>[N-1] b;
  vector<lower=0>[N-1] lam;  
  real<lower=0> tau;           
  real<lower=0> c2;     
transformed parameters{
  vector[N] eta;           
  vector[N] q;      
  vector[N-1] lamhat;
  for(i in 1:(N-1)){     
    lamhat[i] = c2 * lam[i]/sqrt(c2 + pow(tau, 2) * pow(lam[i], 2));
  q[1] = v0;
  q[2:N] = b * tau .* lamhat;
  eta = S * q;                   
  y ~ binomial_logit(n, eta);
  v0 ~ normal(0, 5);             
  b ~ normal(0, 1);     
  lam ~ cauchy(0, 1);  
  c2 ~ inv_gamma(2, 4);  
  tau ~ cauchy(0, 0.25);
generated quantities{
  vector[N] theta;
  theta = inv_logit(eta);

Using the following (totally arbitrary) data, the model samples ok and the parameter values appear to be mostly sensible (or at least what I would expect from this data). However, the predicted values for \theta_j = \text{inv_logit} (\eta_j) seem not to capture the flat portions of the curve well. Additionally, I do not have a lot of experience with these types of priors so I am not certain that it is correctly implemented.

y1 <- c(50,50,50,70,70,70)
x1 <- seq_along(y1)-1
n1 <- rep(100, length(x1))
dat <- list(N = length(x1), y = y1, n = n1, x = x1)
mod1 <- rstan::stan_model(file = "inst/stan/mono1.stan", auto_write = TRUE);
fit1 <- rstan::sampling(mod1,
                          data = dat,
                          chains = 1,thin = 1,iter = 6000,refresh = 1000,warmup = 1000,
                          control = list(adapt_delta = 0.97, max_treedepth = 12))

post <- rstan::extract(fit1)
pred <- sapply(1:dat$N, function(j){
  c(mu = mean(post$theta[,j]), quantile(post$theta[, j], probs = c(0.025, 0.975)))
par(las=1, bty="l", cex.axis=1, cex.lab=1, cex.main=1,
      xaxs="i", yaxs="i", mar = c(5, 5, 3, 5))
plot(x1, y1/n1, type = "l", ylim = c(0.3, 0.9))
lines(x1, pred["mu", ], col = 2)
lines(x1, pred["2.5%", ], lty = 2, col = 2)
lines(x1, pred["97.5%", ], lty = 2, col = 2)
points(x1, pred["mu", ], col = 2)


With increasing sample size, e.g. 300 for each arm (see below), the model dose a better job of recovering the true values, but it is still not ideal.


I am wondering if anyone can suggest a better alternative or perhaps sees an error in the implementation? I have had a look into monotonic splines, but so far I haven’t found anything that works any better.

1 Like

Hi, sorry for not responding earlier.

I offer few quick observations, but not complete solution - sorry.

  1. The issue would be much easier to resolve, if you worked with a simpler model where you have only the monotonic effect and nothing else.

  2. Maybe this is just a mismatch between the model and data: The model assumes monotonically increasing probability and so it has to have an imperfect fit to data that has non-increasing segments. It is usually sensible to create simulated data the matches the model exactly (draw all params from prior, compute transformed parameters, draw observations).

  3. If you haven’t read it already, the brms vignetter on monotonic effects might be helpful . Note that brms uses the simplex type to define the “steps” of the monotonic increase.

Best of luck with your model!