Implementing a simple threshold logistic regression model in stan

I am trying to implement a simple threshold logistic model where the data have been generated using this data generation process:

logit(y_i) = -1.3 + 2.6 \times I(x_i > -0.6), where x \sim N(0, 1)

In the model \tau is an unknown parameter reflecting the threshold (whose true value is -0.6) and \beta_0 and \beta_1 are the intercept and effect of passing the threshold, respectively. The model doesn’t fit so well, and I’m trying to figure out why. I’ve tried increasing tree depth and the number of chains, but has no effect. The chains are generally in the right area, but are not mixing well. Is there anything obvious way I can modify the model to get better results?

Trace plot

Here is the stan code:

data {
    int<lower=1> N;                // number of observations
    real x[N];
    int y[N];                     
  }

parameters {
  real tau;                // cutoffs for low/not low
  real beta0;            // overall intercept for outcome model;
  real beta1;            // effect of not low
}

transformed parameters{ 
  
  vector[N] yhat;           
      
  for (i in 1:N) 
      yhat[i] = beta0 +  beta1*(x[i] > tau);    
  
}

model {
  
  // priors
  
  beta0 ~ normal(0, 5);
  beta1 ~ normal(0, 5);
  tau ~ normal(0, 3);
  
  // outcome model
  
  for (i in 1:N)
    y[i] ~ bernoulli_logit(yhat[i]);

}


Here is the R code in which I generate the simulated data (using package simstudy) and fit the model first with the chngpt package, and then fit the stan model:

library(simstudy)
library(chngpt)

#--- Generate simulated data

d1 <- defData(varname = "V1", formula = 0, variance = 1, dist = "normal")
d1 <- defData(d1, varname = "L1", formula = "-1.3 + 1.6 * (V1 > -.6)", 
              link= "logit", dist = "binary")
dd <- genData(1000, d1)

#--- Fit using chngpt package (MLE)

fit <- chngptm(formula.1 = L1 ~ 1, formula.2 = ~ V1, data = dd, type="step", family="binomial")
summary(fit)

#--- Fit with stan

rt <- stanc("KSG/changepoint.stan");
sm <- stan_model(stanc_ret = rt, verbose=FALSE)

N <- nrow(dd)
y <- dd[, L1]
x <- dd[, V1] 

studydata <- list(N=N, x=x, y=y)
fit <-  sampling(sm, data = studydata, iter = 4000, warmup = 1000, 
                 cores = 4L, chains = 4, control = list(max_treedepth = 10))


Does it help at all to limit the range of tau to the range of x?

That seems like a good idea - though looking at the trace plot from the original, it doesn’t seem like the sampling is taking place far from the support of x. I did go ahead and try it, and the result was pretty much unchanged - so that may be necessary but not sufficient.

1 Like

Ah, after working at this myself a ton, reducing it to a minimal example and discovering persisting pathology, it occurred to me belatedly to search the forums, from which I in turn belatedly realized I should search the Stan User Guide, which indeed has a section on Changepoint models.

As you’ll see there, the problem with the intuitive implementation of the changepoint model as you’ve done is that it imply a latent discrete parameter (took me a bit to understand how; since nothing in the parameters section is discrete, but the conditional involving the changepoint implies a discretization), which gradient-based MCMC algos really can’t handle. But happily mathy-folks have worked out a trick to “marginalize-out” the discreteness.

So take a look there and have a go at implementing the recommendations. If you have any troubles, feel free to post back here. You should also post back what you come up with if you get something working, as the example from the manual is a little overcomplicated in my opinion and I think others would find it useful to see the solution for your simpler scenario (though a gaussian outcome would be even more ideal!).

And to possibly help you quickly implement, here’s what I came up with for an even more minimal example that’s also easy to play with:

data {
	int n;
	real x[n];
	real y[n];
	real known_measurement_noise ;
	int assume_changepoint_at_zero ;
	real changepoint_confidence ;
}
parameters {
	real baseline_level;
	real change_magnitude;
	real<lower=min(x),upper=max(x)> changepoint;
}
transformed parameters{
	vector[n] f; //latent function
	for (i in 1:n){
		f[i] =
			( //ridicuously-non-standard whitespace-style, I know...
				baseline_level
				+  change_magnitude
				* (
					x[i]
					> (
						changepoint*(1-assume_changepoint_at_zero)
					)
				)
			) ;
	}
}
model {
	// priors
	baseline_level ~ std_normal() ;
	change_magnitude ~ std_normal() ;
	changepoint ~ normal(0,1.0/changepoint_confidence) ;
	// outcome model
	y ~ normal(f,known_measurement_noise) ;
}

And fairly minimal R code to play:

library(tidyverse)
library(cmdstanr)
library(bayesplot)

#--- Generate simulated data
n = 1e3
known_measurement_noise = 0.1
baseline_level = 0
change_magnitude = 1
set.seed(123)
dat = tibble::tibble(
	# x: the covariate (modeled as noiselessly-measured/manipulated)
	x = runif(n,-1,1)
	# f: noiseless latent (unobserved) step-function
	, f = (baseline_level + change_magnitude*(x>0) )
	# y: observed (noisy) outcome
	, y = f	+ rnorm(n,f,known_measurement_noise)
)

#quick viz:
ggplot(dat) + geom_point(aes(x=x,y=y),alpha = .1)

#--- Fit with stan

mod <- cmdstan_model("changepoint.stan")
fit = mod$sample(
	data = list(
		n = nrow(dat)
		, x = dat$x
		, y = dat$y
		, known_measurement_noise = known_measurement_noise
		, assume_changepoint_at_zero = F
		, changepoint_confidence = 1 #*IF* not assuming zero changepoint, how confident is the prior on the changepoint?
	)
	, chains = parallel::detectCores()/2-1
	, parallel_chains = parallel::detectCores()/2-1
	, refresh = 100
	, seed = 123
	, iter_warmup = 1e3
	, iter_sampling = 1e3
)

# Check the fit ----
fit$cmdstan_diagnose()
print(fit)

bayesplot::mcmc_trace(fit$draws(variables=c('baseline_level')))
bayesplot::mcmc_trace(fit$draws(variables=c('change_magnitude')))
bayesplot::mcmc_trace(fit$draws(variables=c('changepoint')))

#visualize the posterior on the latent function ----
draws =
	(
		fit$draws(
			variables = 'f'
		)
		%>% posterior::as_draws_df()
		%>% tibble::as_tibble()
		%>% dplyr::select(-.iteration)
		%>% tidyr::pivot_longer(
			cols = starts_with('f')
			, names_to = 'x'
		)
		%>% dplyr::mutate(
			x = str_replace(x,fixed('f['),'')
			, x = str_replace(x,fixed(']'),'')
			, x = dat$x[as.numeric(x)]
			, .chain = factor(.chain)
		)
	)


(
	draws
	%>% dplyr::group_by(
		x
		, .chain
	)
	%>% dplyr::summarise(
		med = mean(value)
		, lo50 = quantile(value,.25)
		, hi50 = quantile(value,.75)
		, lo95 = quantile(value,.025)
		, hi95 = quantile(value,.975)
		, .groups = 'drop'
	)
	%>% ggplot()
	+ facet_grid(.chain~.)
	+ geom_point(
		data = dat
		, mapping = aes(x = x, y = y)
		, alpha = .1
	)
	+ geom_line(
		data =
			(
				draws
				%>% dplyr::nest_by(.chain,.draw)
				%>% dplyr::group_by(.chain)
				%>% dplyr::slice_sample(n=10)
				%>% tidyr::unnest(cols=data)
			)
		, mapping = aes(
			x = x
			, y = value
			, group = .draw
		)
		, alpha = .1
	)
	+ geom_ribbon(
		mapping = aes(
			x = x
			, ymin = lo95
			, ymax = hi95
			, group = .chain
			, fill = .chain
		)
		, alpha = .5
	)

)

Thanks for pointing me in the right direction - it has been quite fruitful.

Here is an implementation of the changepoint model for a binary outcome. First, the stan model (which follows directly from this link) referenced above):

data {
    int<lower=1> N;                // number of observations
    real x[N];
    
    int<lower=1> S;
    real cuts[S];
    
    int<lower=0,upper=1> D[N];          
}
transformed data {
  real log_unif;
  log_unif = -log(S);
}
parameters {
  real r;
  real l;
}
transformed parameters {
  vector[S] lp;
  lp = rep_vector(log_unif, S);
  for (s in 1:S)
    for (n in 1:N)
      lp[s] = lp[s] + bernoulli_logit_lpmf(D[n] | x[n] < cuts[s] ? r : l);
}
model {
  
  r ~ student_t(3, 0, 2.5);
  l ~ student_t(3, 0, 2.5);
  
  target += log_sum_exp(lp);
  
}

And here is a little simulation to try out the code:

library(simstudy)
library(chngpt)
library(bayesplot)
library(glue)

# Generate simulated data

d1 <- defData(varname = "V1", formula = 0, variance = 1, dist = "normal")
d1 <- defData(d1, varname = "L1", formula = "-1 + 2 * (V1 > -0.7)", dist = "binary", link = "logit")
dd <- genData(300, d1)

# Fit using chngpt package (MLE)

fit <- chngptm(formula.1 = L1 ~ 1, formula.2 = ~ V1, data = dd, type="step", family="binomial")
summary(fit)

# Fit with stan

rt <- stanc("KSG/changebin.stan");
sm <- stan_model(stanc_ret = rt, verbose=FALSE)

N <- nrow(dd)
D <- dd[, L1]
x <- dd[, V1] 
cuts <- seq(round(min(x), 1), round(max(x), 1), by = .1)
S <- length(cuts)

studydata <- list(N=N, x=x,  S=S, cuts=cuts, D=D)
fit <-  sampling(sm, data = studydata, iter = 3000, warmup = 500, 
                 cores = 4L, chains = 4, control = list(adapt_delta = 0.8))

# Diagnostics

posterior <- as.array(fit) 
lp <- log_posterior(fit)
np <- nuts_params(fit)

color_scheme_set("mix-brightblue-gray")

p <- mcmc_trace(posterior, pars = c("r","l"), 
                facet_args = list(nrow = 2), np = np) + 
  xlab("Post-warmup iteration")
p

# estimate probability of each threshold

qs <- NULL
for(i in 1:S) {
  lp.i <- glue("lp[{i}]")
  qs[i] <- sum(exp(rstan::extract(fit, pars = lp.i)[[1]]))
}

ps <- (qs/sum(qs))
dps <- data.table(cuts, ps)

ggplot(data = dps, aes(cuts, y = log(ps))) +
  geom_line(color = "grey60") +
  geom_point(size = .5)

At some point in the future, I’ll be posting a slightly more complex model on my blog. Thanks again Mike for your help.