Force brms to run even with invalid response variables

Ryan_Giordano · September 3, 2025, 10:04pm

Is there a way to force brm to ignore invalid response values when running logistic regression (or in general)?

library(brms)

data_df <-
  data.frame(y=c(0.1, 0.2, 0.3, 1.1), x=c(1, 2, 3, 0))

logit_post <- brm(y ~ x - 1,
                  family = bernoulli(link = "logit"),
                  data = data_df)

# Error: Family 'bernoulli' requires responses to contain only two different values.

For example, the above posterior is perfectly well-defined, and I would like brms to blithely put the y I gave it into the standata object and run posterior sampling as it normally would had had the y been binary.

It’s beyond the scope of the present question to explain why I want to do this, but I would ask that respondents give me the benefit of the doubt that I know what I’m doing and have good reasons.

Please also provide the following information in addition to your question:

Operating System: Ubuntu 22.04
brms Version: 2.22.0

jsocolar · September 3, 2025, 10:22pm

bernoulli_lpmf(0.1 | theta)literally isn’t defined. See Binary Distributions from the stan functions reference.

If brms did this, you’d get a bunch of errors from Stan about invalid arguments to bernoulli_lpmf.

What behavior do you want from this model?

Ryan_Giordano · September 3, 2025, 11:05pm

Sounds like you’re saying that due to type restrictions in Stan, this is not possible as implemented, which is understandable and fair.

Formally, the log probability of a y, x pair with parameter \\theta in a GLM with natural link function is given (up to a constant) by y \\theta^T x - A(\\theta^T x) where A(\\cdot) is the log partition function of the corresponding exponential family. This can produce valid posteriors over \\theta even when y is not in its nominal domain — formally this would require modification of the exponential family carrier density, but Stan doesn’t need to know about that, and could simply sample from a posterior whose log joint has the corresponding gradients. But I can see how brms might not be implemented in a way that would allow this.

Ryan_Giordano · September 3, 2025, 11:08pm

Sorry, there seems to be a discord bug and I can’t edit the math errors in my last post. The equations should be, respectively, y \theta^T x - A(\theta^T x), A(\cdot), and \theta. (I don’t see a way to preview my latex rendering so I’ll just leave it out the dollar signs.)

jsocolar · September 4, 2025, 12:30am

The issue isn’t so much that Stan is strongly typed as that brms understands family = bernoulli to indicate a regression in which the response is distributed as a Bernoulli random variable.

Note that brms supports user-defined custom families, so if you want to define a family with a continuous response whose lpdf for observation y and parameter theta is as you desire, that would be straightforward to do.

Something like the following, which is untested and was created in part by ChatGPT

library(brms)

# Custom family: parameter is mu in (0,1), with a logit link to the linear predictor.
bern_pseudo_logit <- custom_family(
  "bern_pseudo_logit",
  dpars  = "mu",
  links  = "logit",   # linear predictor eta; mu = inv_logit(eta)
  type   = "real"     # allow y to be any real (pseudo-likelihood)
)

# Stan function: lpdf for one observation.
# Uses an equivalent, numerically stable form in terms of mu:
#   y*log(mu) + (1 - y)*log(1 - mu)
stan_funs <- "
functions {
  real bern_pseudo_logit_lpdf(real y, real mu) {
    // mu is in (0,1) because brms applies the inverse-logit to the linear predictor.
    return y * log(mu) + (1 - y) * log1m(mu);
  }
  // No RNG on purpose: not a proper density over R, so posterior_predict isn't defined.
}
"

stanvars <- stanvar(scode = stan_funs, block = "functions")

# ---- Example usage ----
set.seed(1)
N  <- 100
x  <- rnorm(N)
b0 <- 0.3
b1 <- 1.2
eta <- b0 + b1 * x
p   <- plogis(eta)
y   <- p + rnorm(N, 0, 0.05)  # real-valued "responses" to demonstrate pseudo-lik

dat <- data.frame(y, x)

fit <- brm(
  bf(y ~ 1 + x),
  data     = dat,
  family   = bern_pseudo_logit,
  stanvars = stanvars,
  prior = c(
    prior(normal(0, 2.5), class = "b"),
    prior(normal(0, 2.5), class = "Intercept")
  ),
  chains = 4, iter = 2000, seed = 123
)

print(fit)

Ryan_Giordano · September 4, 2025, 3:07am

Thank you, on the face of it this looks like it will do the trick! I’ll respond once I’ve tried it.

One quick question — does the brms automatic prior selection interact at all with the family selection? That is, if one does not specify the prior argument in your example, will you get the same priors as if you had specified family=bernoulli?

Ryan_Giordano · September 4, 2025, 4:30am

The following slight modification of the above code appears to run:

data_df <-
  data.frame(y=c(0.1, 0.2, 0.3, 1.1), x=c(1, 2, 3, 0), w=rep(1, 4))

bern_pseudo_logit <- custom_family(
  "bern_pseudo_logit",
  dpars  = "mu",
  links  = "logit",   # linear predictor eta; mu = inv_logit(eta)
  type   = "real"     # allow y to be any real (pseudo-likelihood)
)

stan_funs <- "
  real bern_pseudo_logit_lpdf(real y, real mu) {
    return y * log(mu) + (1 - y) * log1m(mu);
  }
"

stanvars <- stanvar(scode = stan_funs, block = "functions")

brm(y ~ x - 1,
    data=data_df,
    family=bern_pseudo_logit,
    stanvars=stanvars)

I’ll do some tests to make sure it’s computing the correct posterior and return to this thread. But thanks for the quick suggestion!

avehtari · September 4, 2025, 8:41am

Also known as continuous Bernoulli? A thread and a blog post at Brms custom family continuous bernoulli

Ryan_Giordano · September 4, 2025, 1:55pm

Nice connection! It happens that this is not what I mean in this case, since I do not want to use the continuous Bernoulli normalizing constant for y.

Maybe I should elaborate a little on the difference in case anyone else reads this. I do not mean to model the response y as being continuous-valued in [0,1], since if I did then my assertion that the carrier density H(y) in the exponential family doesn’t depend on the parameter would not be true. (I think this is described well in the chain of posts you linked to.) I actually mean for y to be dichotomous, but with values that are different for each observation (because they depend on the regressors), though with the same link function. In other words, in my model, each dichotomous observation’s mean function is modeled the same way by the GLM, though its domain is in fact different from observation to observation as a function of the regressors.

Again, going into why I would do this is beyond the scope of this thread, but this is a simple and clean way of approximating a regressor-aligned mean shift in binary data, without bothering to construct the corresponding randomized binary dataset.

jsocolar · September 4, 2025, 2:40pm

In general no, with some exceptions. The prior depends on the parameter class (e.g. intercepts, coefficients, random effect covariances, etc), but not on the family. However, there are cases where priors on shape parameters for response distributions depend on the family, even when they are treated as being of the same internal brms parameter class. See for example Updated default prior for phi to inv_gamma(0.4, 0.3) by jan-joh · Pull Request #1822 · paul-buerkner/brms · GitHub

Edit: and there could well be other cases of parameter classes whose default priors depend on the family, that I’m not explictly aware of.

saudiwin · September 4, 2025, 3:46pm

I’m not sure I follow what it is you’re doing, but Stan imposes tight restrictions on GLMs like Bernoulli in terms of what data input it accepts.

What I’d recommend is writing out a custom log-likelihood (which isn’t hard in Stan) that you can then overload as needed.

saudiwin · September 4, 2025, 3:47pm

You could always add it to brms as a custom family or just model it in “pure” Stan.

Topic		Replies	Views
Custom family works in Stan but not in brms brms	2	538	June 11, 2022
Error: Family 'binomial' requires numeric responses Modeling brms	2	824	September 11, 2022
Brms custom family continuous bernoulli Events techniques , specification , brms	2	568	February 20, 2023
Error: Argument 'mi' is not supported for family 'bernoulli(logit)' brms rstan	4	1361	January 27, 2020
Logistic_normal() family link function throws 'response category' error Modeling brms	1	63	August 5, 2025

Force brms to run even with invalid response variables

Related topics