Monotonic splines in brms?

Is there a way to force monotonicity upon a spline in a brms model? Due to sparsity at either end of a continuous covariate, we are seeing an up/down tick which we believe to be overfitting / bad fitting and would like to guard against. Any help would be much appreciated.

Thanks!

  • Operating System: Mac OS Big Sur
  • brms Version: 2.14.4

monotonicity upon a spline in a brms model

BRMS uses MGCV library for splines. Monotonicity requires sampling from a multinormal distribution with unequality constraints. Algorithms exists for this, but are not implemented in STAN yet.

Now, if we move to Gaussian processes, we have a great paper of @avehtari

and as shown in the post Curve fitting -- including convexity constraint - #13 by maxbiostat
it is possible to do Monotonicity in Stan.
Although this road is paved by the most comfort, it is one to gain new learning experiences.

3 Likes

Hello,

I have been coming back to this post over and over. I think brms still doesn’t do it (please correct me if I’m wrong!) so I tried to put something together based on this post and (too many) interactions with chatGPT. I got a working code using R & Stan, where I fit a model with monotonic splines conditional on a grouping variable. It’s definitely out of my comfort zone, so I would appreciate any feedback, especially how to speed it up. The toy example doesn’t take long to run, but it’s a day when fitting my real dataset with 4000 observations and a few extra linear predictors (on a Mac M4).

Thanks in advance for your input!



# R/STAN SCRIPT TO FIT MONOTONE SPLINES BY GROUP
#   Response variable: Beta distribution 
#   monotone spline | group


# --- R packages --- ####
library(tidyverse)
library(cmdstanr)
library(parallel)
library(posterior)
library(splines2)   # for iSpline()


# --- Simulate data --- ####

# Parameters
N <- 400        # number of observations
J <- 4          # number of groups
K <- 10         # spline basis dimension
set.seed(123)

# Simulate group variable
z <- sample(1:J, N, replace = TRUE)
# Simulate predictor x for spline
x <- runif(N, 0, 10)
# Simulate predictor x2 for linear 
x2 <- rnorm(N, 0, 1)


# --- Create monotone I-spline basis --- ####

# place K-1 internal knots according to quantiles of x
# cubic splines (degree = 3)
# define iSpline for monotonic increase
knots <- quantile(x, probs = seq(0, 1, length.out = K-1))
Xbasis <- iSpline(x, knots = knots[-c(1,K-1)], degree = 3, intercept = FALSE)  # N x K matrix
K_basis <- ncol(Xbasis)  # actual number of basis functions

# Replicate Xbasis into an array for Stan
# (one basis matrix per group)
X_array <- array(NA, dim = c(J, N, K))
for (j in 1:J) {
  X_array[j,,] <- Xbasis  # same basis for all groups
}

# True parameters simulated data
alpha_true <- runif(J, -1, 1)
beta_true <- lapply(1:J, function(j) runif(K_basis, 0, 0.5))  # positive for monotone
beta_x2 <- c(-0.75, -0.25, 0.25, 0.75)
tau_true <- runif(J, 0.5, 1)
phi_true <- 5
# Compute eta and mu
eta <- numeric(N)
for (n in 1:N) {
  j <- z[n]
  eta[n] <- alpha_true[j] + beta_x2[j] * x2[n] + 
    as.numeric(as.vector(Xbasis[n, ]) %*% (beta_true[[j]] * tau_true[j]))
}
mu <- 1 / (1 + exp(-eta))  # inv_logit

# Simulate data
set.seed(123)
y <- rbeta(N, mu * phi_true, (1 - mu) * phi_true)

# Prepare data for Stan
stan_data <- list(
  N = N,
  J = J,
  K = K_basis,
  X = X_array,
  x2 = x2,
  z = z,
  y = y
)

# --- Write Stan model and fit --- ####

# Define model in Stan
stan_code <- "
data {
  int<lower=1> N;         // Number of observations
  int<lower=1> J;         // Number of groups
  int<lower=1> K;         // Number of spline basis functions
  array[J] matrix[N, K] X;  // spline basis | group
  vector[N] x2;             // linear predictor
  array[N] int<lower=1,upper=J> z;  // groups
  vector<lower=0,upper=1>[N] y;     // response
}

parameters {
  vector[J] alpha;                // intercept | group
  array[J] vector[K] beta_u;      // unconstrained spline coefficients | group
  array[J] real<lower=0> tau;     // smoothing scale | group
  real<lower=0> phi;              // precision
  vector[J] beta_x2;              // slope for x2 | group
}

transformed parameters {
  array[J] vector[K] beta_raw = log1p_exp(beta_u);
  vector[N] mu = inv_logit(eta);
  vector[N] eta;
  for (n in 1:N) {
    int j = z[n];
    eta[n] = alpha[j] + beta_x2[j] * x2[n];   // UPDATED: add linear predictor
    for (k in 1:K)
      eta[n] += X[j,n,k] * tau[j] * beta_raw[j][k];
  }
}

model {
  alpha ~ normal(0, 5);
  tau ~ normal(0, 1);
  phi ~ gamma(2, 0.1);
  for (j in 1:J)
    beta_u[j] ~ normal(-1, 1);
  beta_x2 ~ normal(0, 2);                      // PRIOR for new slope

  y ~ beta(mu .* phi, (1 - mu) .* phi);
}

generated quantities {
  vector[N] mu_out = inv_logit(eta);
}
"

# Write it to a .stan file and compile
stan_file <- "monotone_group_spline.stan"
writeLines(stan_code, con = stan_file)
mod <- cmdstan_model(stan_file)

# Fit the model
fit <- mod$sample(
  data = stan_data,
  seed = 123,
  chains = 4, parallel_chains = 4,
  iter_warmup = 1500, 
  iter_sampling = 1500
)

# View descriptive statistics of fitted parameters
print(fit$summary(c("alpha", "phi", "tau")))




# --- Extract predicted splines --- ####

# Extract posterior draws
alpha <- as_draws_df(fit$draws(c('alpha')))
beta_raw <- as_draws_df(fit$draws(c('beta_raw')))
tau <- as_draws_df(fit$draws(c('tau')))

# Create a grid for x
x_grid <- seq(min(x), max(x), length.out = 50)
X_grid <- iSpline(x_grid, knots = knots[-c(1,K-1)], degree = 3, intercept = FALSE)

# Select 100 posterior draws
set.seed(123)
npost <- 100
posterior_idx <- sample(1:nrow(alpha), npost)

# Store results
mu_draws <- vector(mode = "list", length = npost)
for (j in 1:J) {
  mu_j <- matrix(NA, nrow = npost, ncol = length(x_grid))
  for (i in seq_along(posterior_idx)) {
    idx <- posterior_idx[i]
    beta_j <- unlist(beta_raw[idx, paste0("beta_raw[", j, ",", 1:K_basis, "]")])
    tau_j <- unlist(tau[idx, paste0("tau[", j, "]")])
    alpha_j <- unlist(alpha[idx, paste0("alpha[", j, "]")])
    eta <- alpha_j + X_grid %*% (tau_j * beta_j)
    mu_j[i, ] <- 1 / (1 + exp(-eta))  # inverse logit
  }
  mu_draws[[j]] <- mu_j
}
# combine into a data frame
ep <- do.call(rbind, mu_draws)
colnames(ep) <- x_grid
ep <- as.data.frame(ep) %>%
  mutate(z = rep(1:4, each = npost),
         iter = rep(1:npost, 4))
# convert to long format
ep %<>%
  pivot_longer(cols = as.character(x_grid), 
               names_to = "x", values_to = "y") %>%
  mutate(z = factor(z),
         x = as.numeric(x))
# plot sample draws
ep %>%
  ggplot(aes(x = x, y = y, group = iter)) +
  geom_line(alpha = 0.25) +
  facet_wrap(~z)

Welcome to the forum

It would be better to start a new topic than continue in 5 year old topic

Did you forget a link to some other post?

You can get about 33% time reduction by moving transformed parameters to the model block, switching from beta(mu.^phi, (1-mu).*phi) to beta_proportion(mu, phi), see Stan Functions Reference, and switching beta_u and beta_raw to be matrices and using to_vector(beta_u) ~ normal(-1, 1);

I don’t quickly see if there is a way to rearrange the nested loop to be faster, but in general loops are fast in Stan

1 Like

And do use the following C++ and Stan optimization flags

CXXFLAGS += -march=native -mtune=native
O = 3
STANCFLAGS=--O1

Combining these flags and my previously mentioned changes in the code, I see 50% reduction in sampling time when testing with 5 times more observations.

For completness here’s the whole modified code with additional removal of one nested for loop, which may help stanc --O1 optimization

data {
  int<lower=1> N;         // Number of observations
  int<lower=1> J;         // Number of groups
  int<lower=1> K;         // Number of spline basis functions
  array[J] matrix[N, K] X;  // spline basis | group
  vector[N] x2;             // linear predictor
  array[N] int<lower=1,upper=J> z;  // groups
  vector<lower=0,upper=1>[N] y;     // response
}

parameters {
  vector[J] alpha;                // intercept | group
  matrix[J,K] beta_u;      // unconstrained spline coefficients | group
  vector<lower=0>[J] tau;     // smoothing scale | group
  real<lower=0> phi;              // precision
  vector[J] beta_x2;              // slope for x2 | group
}
model {
  matrix[J,K] beta_raw = log1p_exp(beta_u);
  vector[N] eta = alpha[z] + beta_x2[z] .* x2;
  for (n in 1:N) {
    int j = z[n];
    eta[n] += sum(X[j][n,] .* tau[j] .* beta_raw[j]); 
  }
  vector[N] mu = inv_logit(eta);
  alpha ~ normal(0, 5);
  tau ~ normal(0, 1);
  phi ~ gamma(2, 0.1);
  to_vector(beta_u) ~ normal(-1, 1);
  beta_x2 ~ normal(0, 2);                      // PRIOR for new slope

 y ~ beta_proportion(inv_logit(eta), phi);
}

generated quantities {
  matrix[J,K] beta_raw = log1p_exp(beta_u);
  vector[N] eta = alpha[z] + beta_x2[z] .* x2;
  for (n in 1:N) {
    int j = z[n];
    eta[n] += sum(X[j][n,] .* tau[j] .* beta_raw[j]);
  }
  vector[N] mu_out = inv_logit(eta);
}
1 Like

Thanks for the quick reply and suggestions!
I’ve implemented them and it does seem faster.

The only thing I had to fiddle with was one of the flags:
CXXFLAGS += -march=native -mtune=native
This gave: *error: PCH file was compiled for the tune CPU ā€˜ā€™ but the current translation unit is being compiled for target ā€˜apple-m3’
*
Rebuilding cmdstan did not help and I had to change to:
PRECOMPILED_HEADERS = "false",
CXXFLAGS = "-mcpu=native -mtune=native"

Not sure whether it is optimal, but it compiles and runs…
Cheers!

Right, the link to the other post that was helpful: