Binary response with a "switch" threshold

fabio · March 21, 2024, 8:27am

Dear Stan community,

I am working on N measures of a binary response y (0s and 1s), measured with an environamental variable x (temperature). I suppose the existence of a \bar x that acts like a “switch temperature”.
But I do not know how to model this behavior to infer \bar x.

y \sim 0; x < \bar x

y \sim 1; x \geq \bar x

Which kind of model could I use? I found Modifying Survival Models To Accommodate Thresholding Behavior by @betanalpha . Is this kind of model an option? Is there something simplier to start with?

danielparthier · March 21, 2024, 9:27am

Hello Fabio,

wouldn’t a simple bernoulli work with:

\beta \sim N(0,1) \\ \bar{x} \sim Gamma(2,2/10) \\ \theta = \beta(x-\bar{x}) \\ y \sim BernoulliLogit(\theta)

Obviously the prior for \bar{x} is something you might know better. At least that’s the simplest I could think of which might describe to a degree what you want.

danielparthier · March 21, 2024, 2:01pm

Have you also tried using a \beta parameter which would quantify the strength of the switch? Could be interesting to compare these two models. But I reckon it depends on how much data during the switch you have.

fabio · March 21, 2024, 2:57pm

dear @danielparthier
the model without the \beta parameter gives:

variable     mean   median   sd  mad       q5      q95 rhat ess_bulk ess_tail
 lp__     -1084.58 -1084.31 0.71 0.31 -1085.92 -1084.08 1.01     1823       NA
 x_bar       23.56    23.57 0.12 0.12    23.37    23.75 1.01     1374     1727

while the model with the ‘strength factor’ gives:

 variable    mean  median   sd  mad      q5     q95 rhat ess_bulk ess_tail
 lp__     -927.64 -927.32 1.03 0.73 -929.85 -926.67 1.00     1860     2128
 x_bar       0.78    0.77 1.07 1.06   -1.00    2.54 1.00     1441     1488
 beta        0.01    0.01 0.00 0.00    0.00    0.01 1.00     2768     2390

In the first case lp__ is NA and I don’t know how to interpret it. In the second case the value of \beta is very small as \bar x. What do you think?

danielparthier · March 21, 2024, 3:56pm

I just thought about it a bit more and and what I wrote before would cancel itself out. So the model should actually just be (as so often) a linear regression:
\theta = (x-\bar{x}) \\ y \sim BernoulliLogit(\theta + \beta \theta)
Otherwise we kick out the intercept and a \beta=0 would explain best a more or less even distribution of 0s and 1s (p=0.5). I am not sure about your data input but if you really have \bar{x}=23.56 I am assuming that you didn’t standardise the predictor (temperature). Then it might be a good idea to either adjust the prior for \bar{x} or perform standardisation. Standardisation might make your choice of priors easier if you don’t have a concrete idea.

fabio · March 25, 2024, 9:06am

Thanks @danielparthier , I think that with with your help I’m getting closer. Defining x^*=\frac{x-M}{S} where M is the mean and S the standard deviation of the environmental measures, and following your advice, I defined the model using \hat x as the standardized value for the threshold:

\theta = x^*-\hat x\\ y \sim BernoulliLogit(\theta + \beta \theta);

and calculating in the generated quantities \bar x=\hat x * S + M. The output of this regression model is:

variable    mean  median   sd  mad      q5     q95 rhat ess_bulk ess_tail
 lp__     -570.75 -570.48 0.98 0.74 -572.73 -569.80 1.00     1814     2324
 x_bar       0.07    0.07 0.04 0.04    0.01    0.13 1.00     3290     2574
 beta        1.02    1.02 0.10 0.10    0.86    1.19 1.00     3441     2787

which, using the transformation written above, leads to a \bar x=24.1 \pm 0.20 °C, that sounds good with the idea we had before starting our experiment. So, so far, so good.

Now with such results I would expand the model to other environmental variables and think about variable importance and model selection. Do you know how to translate this survival model with a threshold to brms or stanarms?

danielparthier · March 25, 2024, 5:01pm

You might be interested in the nlf functionality of brms. The parts which you have now can be set up as nlf() and the additional variables can be added by using a model matrix in the lf() for \hat{x} and \beta. I haven’t used brms for such things in a long time, but the example code in the docs looks similar to what you need.
How many variables do you have?

Ven_Popov · March 25, 2024, 5:32pm

This should do it.

# generate some fake data for testing
x <- rlogis(1000, 0, 1)                  # temperature
theta <- 0.5                             # threshold
y <- rbinom(1000, 1, plogis(x, theta))   # responses
data <- data.frame(x = x, y = y)

# logistic regression model
fit <- brm(bf(y ~ beta * (x - theta), beta ~ 1, theta ~ 1, nl = T), 
           data = data, 
           family = bernoulli(link = "logit"))

The point here is that the logistic regression in brms assumes that the threshold is 0. So you just need to shift your predictor variable in a non-linear formula, and then predict the coefficients.

(note that I used a slightly different notation in which \theta is the threshold, as this is a more typical usage, at least in my field. But you can name it whatever you want)

fabio · April 24, 2024, 9:55am

It’s been a while, and I would like to thank both @Ven_Popov and @danielparthier for all the help given to me.
With a complex model involving more variables, brms just crashed my R session. I tried multiple times but i did not got any result at all, so I gave up and I stick to cmdstanr again. Maybe I was defining the formula in bf(..., nl= TRUE) not correctly… but I was running out of time and so I quit.

The model suggested by @danielparthier:

\theta = \left( x-\bar{x}\right) \\ y \sim BernoullLogit(\theta + \beta\theta)

gave me credible thresholds for all the variables.
So: i can mark the question as solved . As always I post here the code for everyone interested or for future reference:

data{
    int<lower=1> N;
    array[N] int<lower=0, upper=1> y;
    vector[N] x;
}

transformed data {
    vector[N] x_std;
    x_std = (x - mean(x)) / sd(x);
}

parameters{
    real x_bar;
    real beta;
}
transformed parameters {
    vector[N] theta = x_std - x_bar;
}

model{
    y ~ bernoulli_logit((1 + beta) * theta);
    x_bar ~ std_normal();
    beta ~ std_normal();
}

generated quantities {
    vector[N] log_lik;
    real threshold = x_bar * sd(x) + mean(x);
}

Topic		Replies	Views
Implementing a simple threshold logistic regression model in stan Modeling	5	1403	November 9, 2020
Estimate an indicator variable within a linear regression Modeling	8	1697	June 15, 2018
Measurement error broken stick model in brms brms specification , brms	1	502	May 16, 2021
Non-linear models using family = Beta() Modeling rstan , brms	22	2231	June 23, 2021
Bernoulli regression Modeling	2	376	July 25, 2019

Binary response with a "switch" threshold

Related topics