Modeling Cutpoint for Noisy Covariate

Hi Everyone,

I’m trying to model when a signal, or covariate in this case , is “turned on”. You can think of some noisy process that can be used to predict the probability of an event taking place.

The DGP can be though of as:
\mu = \beta_0 + \beta_1 I(x > c)
y \sim bernoulli(\mu)

Here is some R code for the data generating process:


inv_logit <- function(u) 1 / (1 + exp(-u))

n <- 2e3
x_latent <- rlnorm(n, 5, 0.8)

cut_point <- 80
beta_0 <- -1
beta_1 <- 2.5

prob <- inv_logit(beta_0 + as.numeric(x_latent > cut_point) * beta_1)
y <- rbinom(n, 1, prob=prob)

model <- cmdstan_model('dynamic_cutpoints.stan')

stan_data <- list(
  N = n,
  x_latent = x_latent,
  y = y
)

fit <- model$sample(stan_data)

And the accompanying stan code:

data {
  int<lower=0> N;           // Number of observations
  vector[N] x_latent;         // Latent variable
  array[N] int<lower=0,upper=1> y; // Binary outcome
}

parameters {
  real cut_point;           // Cut point for latent variable
  real beta_0;              // Intercept
  real beta_1;              // Coefficient for latent variable
}

model {
  vector[N] alpha;

  // Define the linear predictor
  for (i in 1:N) {
    alpha[i] = beta_0 + (x_latent[i] > cut_point) * beta_1;
  }

  // Likelihood
  y ~ bernoulli_logit(alpha);

  cut_point ~ normal(80,5);  // Prior for cut_point
  beta_0 ~ normal(0, 5);       // Prior for beta_0
  beta_1 ~ normal(0, 5);       // Prior for beta_1
}

I’m getting an error where almost all of the transitions are hitting a maximum tree depth warning. But, the true parameters are recovered nicely.

There is also the issue of the high \hat{R}, implying poor mixing - confirmed by the pairs plot:

I’ve poked around and haven’t found related topics or literature. Wondering if anyone has used this technique or knows an alternative parameterization that Stan might like better.

Thanks!

You won’t find many DGPs like that, and if the underlying continuous variable is noisy, dichotomization will turn small quantitative errors into giant qualitative errors.

1 Like

Thanks for the reply Frank!

I’ve banged the mantra to “never dichotomize your continuous variables” into my head, I just wasn’t sure if there is any leeway when the majority of the information a variable contains is noise. But the way you described it is very helpful.

Thanks again!

1 Like

Purely from a computational perspective, the problem is that the likelihood function is discontinuous whenever the value of the cutpoint parameter passes the location of one of the datapoints in x_latent. There may be some super slick trick to compute this, but absent that, there are two options for making this more tractable.

One is to brute-force marginalize over the the discrete possibility that the cutpoint is between any particular two values of the (ordered) x_latents. This could be effective if the length of the unique values of x_latent is not too long (but I would think you’d be ok at least up to hundreds or low thousands of x_latents, maybe more).

An alternative possibility is to suppose that the likelihood transitions smoothly as cutpoint slides across the location of an x_latent, for example by assuming that there’s a true latent value x_latent_latent that determines whether the signal is on or off, which is noisily measured with the x_latent passed as data. The larger you allow that noise term to be, the smoother the likelihood gets.

1 Like

Thank you @jsocolar, that was incredibly help!

Even though it might not be the best choice to dichotomize the data, I’ll still see the modeling exercise through to end. One, for the fun of it and two for any other weary travelers that might want to go down the same path:

I went the route of modeling the transition between states as a smooth function as suggested in the latter half of the comment. I chose to use a cumulative normal as the smoother with a scale passed through in the data block. If we visualize the likelihood function before and after the smoothing, we can see the apparent discontinuities that caused the tree depth errors:

inv_logit <- function(u) 1 / (1 + exp(-u))

n <- 1e3
x_latent <- rlnorm(n, 5, 0.8)

cut_point <- 80
beta_0 <- -1
beta_1 <- 2.5

prob <- inv_logit(beta_0 + as.numeric(x_latent > cut_point) * beta_1)
y <- rbinom(n, 1, prob=prob)

# Visualize discontinuity in likelihood
log_lik_bern <- function(cut_point){
  prob <- inv_logit(beta_0 + as.numeric(x_latent > cut_point) * beta_1)
  lik <- ifelse(y == 1, prob, 1 - prob)
  log(lik) |> sum()
}

log_lik_cut_cdf_norm <- function(cut_point, norm_sd){
  prob <- inv_logit(beta_0 + pnorm(x_latent,cut_point, norm_sd) * beta_1)
  lik <- ifelse(y == 1, prob, 1 - prob)
  log(lik) |> sum()
}

par(mfrow=c(1,2))
cut_seq <- seq(60,100, length.out=800) 
plot(cut_seq, vapply(cut_seq, log_lik_bern, numeric(1)), type='l', lwd=2, xlab='Cutpoint', ylab='Log Lik', main='Original Log Likelihood (Discontinuous)')
plot(cut_seq, vapply(cut_seq, \(x) log_lik_cut_cdf_norm(x, 0.1), numeric(1)), type='l', lwd=2, col=1, xlab='Cutpoint', 
     ylab='', main='Log Likelihood with smoothing')
col <- 1
for (s in c(0.5, 1, 5, 10, 50)){
  lines(cut_seq, vapply(cut_seq, \(x) log_lik_cut_cdf_norm(x, s), numeric(1)), type='l', lwd=2, col=col <- col + 1)
}
legend('bottomright', legend = paste0('smoothing scale = ',c(0.,0.5,1,5,10,50)), col=1:6, lwd=2)

The stan code that uses the smooth transition:

data {
  int<lower=0> N;           // Number of observations
  vector[N] x_latent;         // Latent variable
  array[N] int<lower=0,upper=1> y; // Binary outcome
  real<lower=0> cut_point_scale;
}

parameters {
  real cut_point;           // Cut point for latent variable
  real beta_0;              // Intercept
  real beta_1;              // Coefficient for latent variable
}

model {
  vector[N] alpha;
  
  for (i in 1:N){
    alpha[i] = beta_0 + normal_cdf(x_latent[i] | cut_point, cut_point_scale) * beta_1;
  }
  
  // Likelihood
  y ~ bernoulli_logit(alpha);

  cut_point ~ normal(80,5);  // Prior for cut_point
  beta_0 ~ normal(0, 5);       // Prior for beta_0
  beta_1 ~ normal(0, 5);       // Prior for beta_1
}

and the final R code

model <- cmdstan_model('dynamic_cutpoints.stan')

stan_data <- list(
  N = n,
  x_latent = x_latent,
  y = y,
  cut_point_scale = 0.5
)

fit <- model$sample(stan_data)

fit$summary()

It ran very fast and recovered the parameters nicely with beautiful \hat{R}

image