"The noise parameter is built into the Bernoulli formulation."

Andy and colleagues wrote the following in a 2000 paper in Applied Statistics:

When there is this logistic regression parameterization in the Stan manual:

data {
  int<lower=0> N;
  vector[N] x;
  int<lower=0,upper=1> y[N];
}
parameters {
  real alpha;
  real beta;
}
model {
  y ~ bernoulli_logit(alpha + beta * x);
}

to what extent is it safe to assume that the model is equivalent to a latent parameterization where there is an implied residual epsilon{i} for each observation, where the residuals are distributed with a variance of π^2/3, as also described in the Austin and Merlo tutorial here? @Bob_Carpenter @andrewgelman

I would say that it is safe. I think it is easiest to understand if you look at the cdf of the logistic distribution (wikipedia link). When you set mu=0 and s=1, then the cdf becomes the inverse logit function and the variance of the logistic distribution is \pi^2/3. Then you have something like

\begin{align} P(y_i = 1 | B) &= P(X_i B + \epsilon_i > 0) \\ &= P(\epsilon_i > -X_i B) \\ & = 1 - 1/(1 + e^{-(-X_i B)}) \\ &= 1/(1 + e^{-X_iB}) \end{align}
3 Likes

In Stan,

y ~ bernoulli_logit(u)

is equivalent to

y ~ bernoulli(inv_logit(u));

which in turn would be equivalent to

z ~ logistic(inv_logit(u), 1);
y = z > 0;

if Stan supported discrete parameters like z. You could try it in BUGS or JAGS if you want to convince yourself they give the same answer.

Formally, we only need to show that

Z \sim \textrm{logistic}(\textrm{logit}^{-1}(u), 1)

implies

Pr[Z > 0] = \textrm{logit}^{-1}(u),

which is straightforward because \textrm{logit}^{-1} is the cdf of the standard logistic distribution (and u just shifts it).

1 Like

Awesome. Thanks, @Bob_Carpenter. Your foundational contributions to making Stan a go-to option are much appreciated.

For others in the forum, I’ll link to the Stan documentation on the logistic function: 19.8 Logistic distribution | Stan Functions Reference

And, therefore, Stan follows convention by setting the sigma to 1 as part of the standard logistic:
y ~ logistic (mu, sigma)

And then to confirm, that’s what the manual implies when it says that the noise parameter (sigma, in other words) is built into the bernoulli_logit function, correct?