Thank you @jsocolar, that was incredibly help!
Even though it might not be the best choice to dichotomize the data, I’ll still see the modeling exercise through to end. One, for the fun of it and two for any other weary travelers that might want to go down the same path:
I went the route of modeling the transition between states as a smooth function as suggested in the latter half of the comment. I chose to use a cumulative normal as the smoother with a scale passed through in the data block. If we visualize the likelihood function before and after the smoothing, we can see the apparent discontinuities that caused the tree depth errors:
inv_logit <- function(u) 1 / (1 + exp(-u))
n <- 1e3
x_latent <- rlnorm(n, 5, 0.8)
cut_point <- 80
beta_0 <- -1
beta_1 <- 2.5
prob <- inv_logit(beta_0 + as.numeric(x_latent > cut_point) * beta_1)
y <- rbinom(n, 1, prob=prob)
# Visualize discontinuity in likelihood
log_lik_bern <- function(cut_point){
prob <- inv_logit(beta_0 + as.numeric(x_latent > cut_point) * beta_1)
lik <- ifelse(y == 1, prob, 1 - prob)
log(lik) |> sum()
}
log_lik_cut_cdf_norm <- function(cut_point, norm_sd){
prob <- inv_logit(beta_0 + pnorm(x_latent,cut_point, norm_sd) * beta_1)
lik <- ifelse(y == 1, prob, 1 - prob)
log(lik) |> sum()
}
par(mfrow=c(1,2))
cut_seq <- seq(60,100, length.out=800)
plot(cut_seq, vapply(cut_seq, log_lik_bern, numeric(1)), type='l', lwd=2, xlab='Cutpoint', ylab='Log Lik', main='Original Log Likelihood (Discontinuous)')
plot(cut_seq, vapply(cut_seq, \(x) log_lik_cut_cdf_norm(x, 0.1), numeric(1)), type='l', lwd=2, col=1, xlab='Cutpoint',
ylab='', main='Log Likelihood with smoothing')
col <- 1
for (s in c(0.5, 1, 5, 10, 50)){
lines(cut_seq, vapply(cut_seq, \(x) log_lik_cut_cdf_norm(x, s), numeric(1)), type='l', lwd=2, col=col <- col + 1)
}
legend('bottomright', legend = paste0('smoothing scale = ',c(0.,0.5,1,5,10,50)), col=1:6, lwd=2)
The stan code that uses the smooth transition:
data {
int<lower=0> N; // Number of observations
vector[N] x_latent; // Latent variable
array[N] int<lower=0,upper=1> y; // Binary outcome
real<lower=0> cut_point_scale;
}
parameters {
real cut_point; // Cut point for latent variable
real beta_0; // Intercept
real beta_1; // Coefficient for latent variable
}
model {
vector[N] alpha;
for (i in 1:N){
alpha[i] = beta_0 + normal_cdf(x_latent[i] | cut_point, cut_point_scale) * beta_1;
}
// Likelihood
y ~ bernoulli_logit(alpha);
cut_point ~ normal(80,5); // Prior for cut_point
beta_0 ~ normal(0, 5); // Prior for beta_0
beta_1 ~ normal(0, 5); // Prior for beta_1
}
and the final R code
model <- cmdstan_model('dynamic_cutpoints.stan')
stan_data <- list(
N = n,
x_latent = x_latent,
y = y,
cut_point_scale = 0.5
)
fit <- model$sample(stan_data)
fit$summary()
It ran very fast and recovered the parameters nicely with beautiful \hat{R}