Translating robust logistic regression from rstan to brms

jrrae · August 16, 2018, 4:47pm

I am interested in fitting a “robust” logistic regression model from Doing Bayesian Data Analysis (2nd Edition). This model treats data as a mixture of two sources: (1) guessing and (2) a logistic function of predictors. For guessing, the y values are thought to come from y ~ Bernoulli(m=1/2).

Using some data I found in this vignette, I think I’ve been able to implement the model in a simple case using rstan:

library("rstan")

# Prepare data 
url <- "http://stat.columbia.edu/~gelman/arm/examples/arsenic/wells.dat"
wells <- read.table(url)
X <- model.matrix(~ arsenic, wells)
standata <- list(y = wells$switch, arsenic = wells$arsenic, N = nrow(wells))

# Model
guess.model <- '
data {                          
  int<lower=0> N;                
  int<lower=0,upper=1> y[N];     
  vector[N] arsenic;            
}
parameters {
  real alpha;                   
  real b_arsenic;                
  real<lower=0,upper=1> guess;
}
transformed parameters { 
  vector[N] mu;
  for ( i in 1:N) {
    mu[i] = .5*guess + (1-guess)*inv_logit(alpha + b_arsenic * arsenic[i]);
  }
}
model {
  y ~ bernoulli(mu);
  alpha ~ normal(0,1);         
  b_arsenic ~ normal(0,1);     
  guess ~ beta(1,9);
}
'

# Fit model
fit <- stan(model_code = guess.model, data = standata)
print(fit, pars = c("alpha","b_arsenic","guess"))

However, I’m unsure how to implement the guessing parameter in brms. Could anyone point me in the right direction?

Operating System: Windows 10
Interface Version: 3.2.1

paul.buerkner · August 17, 2018, 7:27am

You can learn more about that in vignette("brms_nonlinear")

Solomon · January 11, 2019, 2:52am

Many thanks to jrrae for posing this question. I’ve been working through Kruschke’s book and was initially baffled by this model, too (chapter 21.3). In case any others are curious, I was able to follow Paul’s lead and fit the model like so.

fit <-
  brm(data = my_data, 
      family = binomial(link = "identity"),
      bf(male | trials(1) ~ a * .5 + (1 - a) * 1 / (1 + exp(-1 * (b0 + b1 * weight_z))),
         a + b0 + b1  ~ 1,
         nl = TRUE),
      prior = c(prior(normal(0, 2), nlpar = "b0"),
                prior(normal(0, 2), nlpar = "b1"),
                prior(beta(1, 9),   nlpar = "a")))

Much like Paul cautioned in his vignette("brms_nonlinear"), the model yielded a number of divergent transitions which might well be alleviated with better priors. But the results match Kruschke’s pretty well.

Note. My model is based on the data and parameters from Kruschke’s text, not those in the vignette jrrae cited, above.

andrewgelman · March 7, 2019, 4:27pm

If you want to fit a robust logistic regression, I recommend that instead of this unnatural (to me) mixture model, you use the robit model, which is like logistic regression but using the Student-t rather than the logistic distribution.

Solomon · March 9, 2019, 10:10pm

Thank you for the suggestion, @andrewgelman! It appears one can find the details of that model from Gelman and Hill (2007), section 6.6. There we read:

Logistic regression can be conveniently “robustified” by generalizing the latent-data formulation (5.4):

y_i = \bigg \{ \begin{matrix} 1 \text{ if } z_i > 0\\ 0 \text{ if } z_i < 0 \end{matrix}
z_i = X_i \beta + \epsilon_i,

to give the latent errors \epsilon a t distribution:

\epsilon_i \sim t_\nu \bigg ( \frac{\nu - 2}{\nu} \bigg ),

with the degrees-of-freedom parameter \nu > 2 estimated from the data and the t distribution scaled so that its standard deviation equals 1.

That model currently outstrips my brms chops. Bonus Stan Forums points for anyone who can show the relevant brms code.

paul.buerkner · March 10, 2019, 9:01am

We can surely write that model in brms. However, I am unsure about why we compute
(\nu - 2) / \nu? If we just wanted the errors to be (standard) t-distributed, wouldn’t we just write

\epsilon \sim t(\nu, 0, 1)

paul.buerkner · March 10, 2019, 9:16am

Ah I see, it is to fix the SD to 1. That makes sense: (\nu - 2) / \nu is the adjusted scale parameter as var(\epsilon) = \nu / (\nu - 2). That is, we would write

\epsilon \sim t(\nu, 0, (\nu - 2) / \nu)

@andrewgelman Can you confirm this real quick?

Suppose, for now, that the above is correct. We could translate this to brms as follows (not tested, so please check carefully):

library(brms)

stan_inv_robit <- "
  real inv_robit(real y, real nu) {
    student_cdf(y, nu, 0, (nu - 2) / nu);
  }
"
stanvar_inv_robit <- stanvar(scode = stan_inv_robit, block = "functions")

bform <- bf(
  male | trials(1) ~ inv_robit(eta, nu),
  eta ~ <predictors>,
  nu ~ 1,
  nl = TRUE,
  family = bernoulli("identity")
)

bprior <- prior(normal(0, 5), nlpar = "eta") +
  prior(gamma(2, 0.1), nlpar = "nu", lb = 2)

fit <- brm(
  bform, prior = bprior,
  stanvars = stanvar_inv_robit,
  ...
)

andrewgelman · March 10, 2019, 10:33pm

Hi, yes, I guess this would have a problem if nu is less than 2, so you’d want to constrain nu to be at least 2, or even to constrain it to be at least 3. I don’t know how to ping Aki on this so I’ll just cc him, also Jonah and Ben in case this ever came up in rstanarm.
See you

Andrew

paul.buerkner · March 11, 2019, 7:22am

Thanks Andrew! Through prior(gamma(2, 0.1), nlpar = "nu", lb = 2) brms ensures that nu has a lower bound (lb) of 2. You can ping people via @“name” for instance @avehtari for Aki.

josh_fiechter · July 26, 2020, 3:35pm

Hi everyone. I’d like to conduct a hierarchical nonlinear robit regression. Should I append an nlf statement for eta to the initial formula? E.g.:

bform <- bf(y ~ inv_robit(eta, nu),
           nu ~ 1,
           nl = T,
           family = bernoulli('identity')) +
  nlf(eta ~ par1 * x ^ par2,
      par1 + par2 ~ 1 + (1|group))

Topic		Replies	Views
Robit regression not robust brms	16	2295	March 20, 2021
How to translate a Stan nonlinear fit to brms brms	4	487	January 23, 2020
Updating Gaussian multiple regression code and building a robust model Modeling rstan	4	432	May 12, 2021
Log-Linear model with `brms`? brms	0	537	March 29, 2022
Custom Item-Response Theory model in brms Modeling specification	1	446	August 24, 2020

Translating robust logistic regression from rstan to brms

Related topics