Hi Everyone,
I’m trying to model when a signal, or covariate in this case , is “turned on”. You can think of some noisy process that can be used to predict the probability of an event taking place.
The DGP can be though of as:
\mu = \beta_0 + \beta_1 I(x > c)
y \sim bernoulli(\mu)
Here is some R code for the data generating process:
inv_logit <- function(u) 1 / (1 + exp(-u))
n <- 2e3
x_latent <- rlnorm(n, 5, 0.8)
cut_point <- 80
beta_0 <- -1
beta_1 <- 2.5
prob <- inv_logit(beta_0 + as.numeric(x_latent > cut_point) * beta_1)
y <- rbinom(n, 1, prob=prob)
model <- cmdstan_model('dynamic_cutpoints.stan')
stan_data <- list(
N = n,
x_latent = x_latent,
y = y
)
fit <- model$sample(stan_data)
And the accompanying stan code:
data {
int<lower=0> N; // Number of observations
vector[N] x_latent; // Latent variable
array[N] int<lower=0,upper=1> y; // Binary outcome
}
parameters {
real cut_point; // Cut point for latent variable
real beta_0; // Intercept
real beta_1; // Coefficient for latent variable
}
model {
vector[N] alpha;
// Define the linear predictor
for (i in 1:N) {
alpha[i] = beta_0 + (x_latent[i] > cut_point) * beta_1;
}
// Likelihood
y ~ bernoulli_logit(alpha);
cut_point ~ normal(80,5); // Prior for cut_point
beta_0 ~ normal(0, 5); // Prior for beta_0
beta_1 ~ normal(0, 5); // Prior for beta_1
}
I’m getting an error where almost all of the transitions are hitting a maximum tree depth warning. But, the true parameters are recovered nicely.
There is also the issue of the high \hat{R}, implying poor mixing - confirmed by the pairs plot:
I’ve poked around and haven’t found related topics or literature. Wondering if anyone has used this technique or knows an alternative parameterization that Stan might like better.
Thanks!