Model overdispersion in function of covariate for binary data

Hey
I would like a way to model the over-dispersion in binary data.
This over-dispersion varies in function of a covariate.
More specifically I would like the implement the following model, preferably in BRMS:

Y_i \sim \text{Bernoulli}(\text{sigmoid}(\eta_i))
\eta_i \sim \text{Normal}(\mu_i, \sigma_i)
\mu_i = \beta_0 + \beta_1x_{i}
\log(\sigma_i) = \gamma_0 + \gamma_1z_{i}

Is this possible in BRMS? I tried to look into the non-linear brms formula options but I got confused and did not manage. (if possible at all)
I also looked at using observation level random effect but I don’t see a way to vary the overdispersion in function of a covariate.

Does somebody has a clue on how to implement this?

Thanks in advance!

This depends on the details of your data, but can you reformulate things in terms of successes over trials? Then you could use Beta-Binomial and relate phi to the covariate.

See also this.

Hey,Thanks for your answer,

In my my case that does not seem possible. each unit / row in my data frame is on its own, has also unique values for the covariates. The is no logical way to group them.

It is possible to achieve what you suggest in brms via nonlinear formula syntax.

However, before venturing down this road, understand that observation level random effects are not generally identifiable when the outcome is binary (because the model doesn’t have a good way to rule out the possibility that the random effect variance is huge, leading to a set of observation-level probabilities that are all essentially zero or one irrespective of the linear predictor). I’m not 100% sure that this non-identifiability would persist if there is a strong covariate-based predictor for the variance, but I’m afraid it might persist…

The key intuition here is that if you see a response consisting of 1s and 0s in equal proportion, then in an intercept-only model with the intercept estimated as zero there is absolutely no information in the likelihood to inform the size of an observation-level random effect variance. The data stream would look identical regardless of the size of the random effect variance.

Edit: A more complete way to think about this is that (logit-normal)-bernoulli, or indeed anything-bernoulli, marginalizes to bernoulli. Therefore, if a model such as yours is to be identifiable (other than via the prior), the only way it can possibly be identifiable is if the logit-linear predictor fails to capture the true covariate relationships, whereas marginalizing over Gaussian noise added to the logit-linear predictor yields something better. That is, marginalizing over Gaussian noise added to a logit-linear predictor will yield a predictor that is not necessarily logit-linear in the covariates. Your model will be degenerate unless there is enough power to resolve which functional form fits best, between one that is logit-linear in the covariates and the universe of similar forms that are arrived at by marginalizing over Gaussian noise added to the linear predictor. Even in this case, there is no important difference in meaning between calling such a model “overdispersed” or working with the marginalized form where there is no overdispersion term and where relationships that are not logit-linear yield exact predictions of the probability.

Thanks for your reply and insights!
I wanted to try regardless :)
I tried a long time with non linear formulas to implement this and in the end came up with the following in BRMS:

nl_formula <- bf(y ~ m + olre * exp(s),nl = TRUE) +
  lf(m ~ x) +
  lf(olre ~ 0 + (1|obs)) +
  lf(s ~ z)

prior = c(
  prior(normal(0, 1), class = "b", nlpar = "m"),
  prior(constant(1), class = "sd", nlpar = "olre",group = "slide_id"),
  prior(normal(0, 1), class = "b", nlpar = "s")
  )

fit<- brm(f, data = d, family = bernoulli(), prior = priors)

So I fixed the variance to 1 of the observation level random effect (olre) and then estimated this separately in the model in function of z.

The model does converge, but the results are not as expected. So I have to check why this is!

But I am not sure if my approach is even valid?