I’ve been working through Exercise 15.5 of Regression and Other Stories, which asks to take a logit model and convert it to a probit.
I found when trying to fit models with more than a few predictors, without setting the prior the sampler will fail.
Is there a better default weakly informative prior?
I’m running rstanarm
version 2.21.1 on Ubuntu-20.04 (via WSL2).
To demonstrate here’s an example where we’re trying to fit a random binomial variable on 5 noise predictors with 1000 data points:
N <- 1000
fake_data <- data.frame(a=rbinom(N,1,0.5),
b=rbinom(N,1,0.5),
c=rbinom(N,1,0.5),
d=rbinom(N,1,0.5),
e=rbinom(N,1,0.5),
y=rbinom(N,1,0.5))
fit <- rstanarm::stan_glm(y ~ a + b + c + d + e,
family=binomial(link="probit"),
data=fake_data)
Most of the time running this I will get an error like:
SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 1).
Chain 1: Rejecting initial value:
Chain 1: Log probability evaluates to log(0), i.e. negative infinity.
Chain 1: Stan can't start sampling from this initial value.
...
Chain 1:
Chain 1: Initialization between (-2, 2) failed after 100 attempts.
Chain 1: Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."
error occurred during calling the sampler; sampling not done
Error in check_stanfit(stanfit) :
Invalid stanfit object produced please report bug
It seems that the more predictors there are the more likely this is to happen.
I haven’t come across this problem when working with logistic regression (using a “logit” link function in the above).
One way I can work around this is by setting a flat prior, which seems to converge.
fit <- rstanarm::stan_glm(y ~ a + b + c + d + e,
family=binomial(link="probit"),
prior=NULL,
data=fake_data)
Another way to get it to converge is by setting a slightly tighter prior (which gives very similar estimates):
fit <- rstanarm::stan_glm(y ~ a + b + c + d + e,
family=binomial(link="probit"),
prior=rstanarm::normal(scale=1, autoscale=TRUE),
data=fake_data)
I don’t understand how Stan works, but I’m guessing as we add more predictors there’s a greater chance of getting one that is far from it’s actual value (in this case 0).
I suspect because the probabilities in the normal distribution go to zero much faster than in the logistic distribution, this leads to numerical errors in link="probit"
that don’t occur in link="logit"
.
Is there a better heuristic for a weakly informative prior for probit regression?
I’ve found in an example using prior=rstanarm::normal(scale=0.5, autoscale=TRUE),
seems to work and give similar results to a flat prior, but I don’t know whether that’s a reasonable thing to do in general.