Implementing a simple threshold logistic regression model in stan

kgoldfeld · November 5, 2020, 8:24pm

I am trying to implement a simple threshold logistic model where the data have been generated using this data generation process:

logit(y_i) = -1.3 + 2.6 \times I(x_i > -0.6), where x \sim N(0, 1)

In the model \tau is an unknown parameter reflecting the threshold (whose true value is -0.6) and \beta_0 and \beta_1 are the intercept and effect of passing the threshold, respectively. The model doesn’t fit so well, and I’m trying to figure out why. I’ve tried increasing tree depth and the number of chains, but has no effect. The chains are generally in the right area, but are not mixing well. Is there anything obvious way I can modify the model to get better results?

Trace plot

Here is the stan code:

data {
    int<lower=1> N;                // number of observations
    real x[N];
    int y[N];                     
  }

parameters {
  real tau;                // cutoffs for low/not low
  real beta0;            // overall intercept for outcome model;
  real beta1;            // effect of not low
}

transformed parameters{ 
  
  vector[N] yhat;           
      
  for (i in 1:N) 
      yhat[i] = beta0 +  beta1*(x[i] > tau);    
  
}

model {
  
  // priors
  
  beta0 ~ normal(0, 5);
  beta1 ~ normal(0, 5);
  tau ~ normal(0, 3);
  
  // outcome model
  
  for (i in 1:N)
    y[i] ~ bernoulli_logit(yhat[i]);

}

Here is the R code in which I generate the simulated data (using package simstudy) and fit the model first with the chngpt package, and then fit the stan model:

library(simstudy)
library(chngpt)

#--- Generate simulated data

d1 <- defData(varname = "V1", formula = 0, variance = 1, dist = "normal")
d1 <- defData(d1, varname = "L1", formula = "-1.3 + 1.6 * (V1 > -.6)", 
              link= "logit", dist = "binary")
dd <- genData(1000, d1)

#--- Fit using chngpt package (MLE)

fit <- chngptm(formula.1 = L1 ~ 1, formula.2 = ~ V1, data = dd, type="step", family="binomial")
summary(fit)

#--- Fit with stan

rt <- stanc("KSG/changepoint.stan");
sm <- stan_model(stanc_ret = rt, verbose=FALSE)

N <- nrow(dd)
y <- dd[, L1]
x <- dd[, V1] 

studydata <- list(N=N, x=x, y=y)
fit <-  sampling(sm, data = studydata, iter = 4000, warmup = 1000, 
                 cores = 4L, chains = 4, control = list(max_treedepth = 10))

mike-lawrence · November 5, 2020, 10:08pm

Does it help at all to limit the range of tau to the range of x?

kgoldfeld · November 6, 2020, 1:25am

That seems like a good idea - though looking at the trace plot from the original, it doesn’t seem like the sampling is taking place far from the support of x. I did go ahead and try it, and the result was pretty much unchanged - so that may be necessary but not sufficient.

mike-lawrence · November 6, 2020, 1:51pm

Ah, after working at this myself a ton, reducing it to a minimal example and discovering persisting pathology, it occurred to me belatedly to search the forums, from which I in turn belatedly realized I should search the Stan User Guide, which indeed has a section on Changepoint models.

As you’ll see there, the problem with the intuitive implementation of the changepoint model as you’ve done is that it imply a latent discrete parameter (took me a bit to understand how; since nothing in the parameters section is discrete, but the conditional involving the changepoint implies a discretization), which gradient-based MCMC algos really can’t handle. But happily mathy-folks have worked out a trick to “marginalize-out” the discreteness.

So take a look there and have a go at implementing the recommendations. If you have any troubles, feel free to post back here. You should also post back what you come up with if you get something working, as the example from the manual is a little overcomplicated in my opinion and I think others would find it useful to see the solution for your simpler scenario (though a gaussian outcome would be even more ideal!).

mike-lawrence · November 6, 2020, 1:54pm

And to possibly help you quickly implement, here’s what I came up with for an even more minimal example that’s also easy to play with:

data {
	int n;
	real x[n];
	real y[n];
	real known_measurement_noise ;
	int assume_changepoint_at_zero ;
	real changepoint_confidence ;
}
parameters {
	real baseline_level;
	real change_magnitude;
	real<lower=min(x),upper=max(x)> changepoint;
}
transformed parameters{
	vector[n] f; //latent function
	for (i in 1:n){
		f[i] =
			( //ridicuously-non-standard whitespace-style, I know...
				baseline_level
				+  change_magnitude
				* (
					x[i]
					> (
						changepoint*(1-assume_changepoint_at_zero)
					)
				)
			) ;
	}
}
model {
	// priors
	baseline_level ~ std_normal() ;
	change_magnitude ~ std_normal() ;
	changepoint ~ normal(0,1.0/changepoint_confidence) ;
	// outcome model
	y ~ normal(f,known_measurement_noise) ;
}

And fairly minimal R code to play:

library(tidyverse)
library(cmdstanr)
library(bayesplot)

#--- Generate simulated data
n = 1e3
known_measurement_noise = 0.1
baseline_level = 0
change_magnitude = 1
set.seed(123)
dat = tibble::tibble(
	# x: the covariate (modeled as noiselessly-measured/manipulated)
	x = runif(n,-1,1)
	# f: noiseless latent (unobserved) step-function
	, f = (baseline_level + change_magnitude*(x>0) )
	# y: observed (noisy) outcome
	, y = f	+ rnorm(n,f,known_measurement_noise)
)

#quick viz:
ggplot(dat) + geom_point(aes(x=x,y=y),alpha = .1)

#--- Fit with stan

mod <- cmdstan_model("changepoint.stan")
fit = mod$sample(
	data = list(
		n = nrow(dat)
		, x = dat$x
		, y = dat$y
		, known_measurement_noise = known_measurement_noise
		, assume_changepoint_at_zero = F
		, changepoint_confidence = 1 #*IF* not assuming zero changepoint, how confident is the prior on the changepoint?
	)
	, chains = parallel::detectCores()/2-1
	, parallel_chains = parallel::detectCores()/2-1
	, refresh = 100
	, seed = 123
	, iter_warmup = 1e3
	, iter_sampling = 1e3
)

# Check the fit ----
fit$cmdstan_diagnose()
print(fit)

bayesplot::mcmc_trace(fit$draws(variables=c('baseline_level')))
bayesplot::mcmc_trace(fit$draws(variables=c('change_magnitude')))
bayesplot::mcmc_trace(fit$draws(variables=c('changepoint')))

#visualize the posterior on the latent function ----
draws =
	(
		fit$draws(
			variables = 'f'
		)
		%>% posterior::as_draws_df()
		%>% tibble::as_tibble()
		%>% dplyr::select(-.iteration)
		%>% tidyr::pivot_longer(
			cols = starts_with('f')
			, names_to = 'x'
		)
		%>% dplyr::mutate(
			x = str_replace(x,fixed('f['),'')
			, x = str_replace(x,fixed(']'),'')
			, x = dat$x[as.numeric(x)]
			, .chain = factor(.chain)
		)
	)


(
	draws
	%>% dplyr::group_by(
		x
		, .chain
	)
	%>% dplyr::summarise(
		med = mean(value)
		, lo50 = quantile(value,.25)
		, hi50 = quantile(value,.75)
		, lo95 = quantile(value,.025)
		, hi95 = quantile(value,.975)
		, .groups = 'drop'
	)
	%>% ggplot()
	+ facet_grid(.chain~.)
	+ geom_point(
		data = dat
		, mapping = aes(x = x, y = y)
		, alpha = .1
	)
	+ geom_line(
		data =
			(
				draws
				%>% dplyr::nest_by(.chain,.draw)
				%>% dplyr::group_by(.chain)
				%>% dplyr::slice_sample(n=10)
				%>% tidyr::unnest(cols=data)
			)
		, mapping = aes(
			x = x
			, y = value
			, group = .draw
		)
		, alpha = .1
	)
	+ geom_ribbon(
		mapping = aes(
			x = x
			, ymin = lo95
			, ymax = hi95
			, group = .chain
			, fill = .chain
		)
		, alpha = .5
	)

)

kgoldfeld · November 9, 2020, 11:42pm

Thanks for pointing me in the right direction - it has been quite fruitful.

Here is an implementation of the changepoint model for a binary outcome. First, the stan model (which follows directly from this link) referenced above):

data {
    int<lower=1> N;                // number of observations
    real x[N];
    
    int<lower=1> S;
    real cuts[S];
    
    int<lower=0,upper=1> D[N];          
}
transformed data {
  real log_unif;
  log_unif = -log(S);
}
parameters {
  real r;
  real l;
}
transformed parameters {
  vector[S] lp;
  lp = rep_vector(log_unif, S);
  for (s in 1:S)
    for (n in 1:N)
      lp[s] = lp[s] + bernoulli_logit_lpmf(D[n] | x[n] < cuts[s] ? r : l);
}
model {
  
  r ~ student_t(3, 0, 2.5);
  l ~ student_t(3, 0, 2.5);
  
  target += log_sum_exp(lp);
  
}

And here is a little simulation to try out the code:

library(simstudy)
library(chngpt)
library(bayesplot)
library(glue)

# Generate simulated data

d1 <- defData(varname = "V1", formula = 0, variance = 1, dist = "normal")
d1 <- defData(d1, varname = "L1", formula = "-1 + 2 * (V1 > -0.7)", dist = "binary", link = "logit")
dd <- genData(300, d1)

# Fit using chngpt package (MLE)

fit <- chngptm(formula.1 = L1 ~ 1, formula.2 = ~ V1, data = dd, type="step", family="binomial")
summary(fit)

# Fit with stan

rt <- stanc("KSG/changebin.stan");
sm <- stan_model(stanc_ret = rt, verbose=FALSE)

N <- nrow(dd)
D <- dd[, L1]
x <- dd[, V1] 
cuts <- seq(round(min(x), 1), round(max(x), 1), by = .1)
S <- length(cuts)

studydata <- list(N=N, x=x,  S=S, cuts=cuts, D=D)
fit <-  sampling(sm, data = studydata, iter = 3000, warmup = 500, 
                 cores = 4L, chains = 4, control = list(adapt_delta = 0.8))

# Diagnostics

posterior <- as.array(fit) 
lp <- log_posterior(fit)
np <- nuts_params(fit)

color_scheme_set("mix-brightblue-gray")

p <- mcmc_trace(posterior, pars = c("r","l"), 
                facet_args = list(nrow = 2), np = np) + 
  xlab("Post-warmup iteration")
p

# estimate probability of each threshold

qs <- NULL
for(i in 1:S) {
  lp.i <- glue("lp[{i}]")
  qs[i] <- sum(exp(rstan::extract(fit, pars = lp.i)[[1]]))
}

ps <- (qs/sum(qs))
dps <- data.table(cuts, ps)

ggplot(data = dps, aes(cuts, y = log(ps))) +
  geom_line(color = "grey60") +
  geom_point(size = .5)

At some point in the future, I’ll be posting a slightly more complex model on my blog. Thanks again Mike for your help.

Topic		Replies	Views
Log binomial random intercept model Modeling fitting-issues	8	2695	March 4, 2018
Logistic Regression in Stan Modeling logistic-regression	2	586	January 4, 2021
Model for binary outcome with misclassification Modeling	1	780	April 5, 2018
Efficient implementation of the for loop? Modeling rstan , fitting-issues	1	596	February 18, 2022
Trying to fit my costumised logistic function Modeling specification	30	2422	January 28, 2020

Implementing a simple threshold logistic regression model in stan

Related topics