I’ve been working through Exercise 15.5 of Regression and Other Stories, which asks to take a logit model and convert it to a probit.
I found when trying to fit models with more than a few predictors, without setting the prior the sampler will fail.
Is there a better default weakly informative prior?
rstanarm version 2.21.1 on Ubuntu-20.04 (via WSL2).
To demonstrate here’s an example where we’re trying to fit a random binomial variable on 5 noise predictors with 1000 data points:
N <- 1000 fake_data <- data.frame(a=rbinom(N,1,0.5), b=rbinom(N,1,0.5), c=rbinom(N,1,0.5), d=rbinom(N,1,0.5), e=rbinom(N,1,0.5), y=rbinom(N,1,0.5)) fit <- rstanarm::stan_glm(y ~ a + b + c + d + e, family=binomial(link="probit"), data=fake_data)
Most of the time running this I will get an error like:
SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 1). Chain 1: Rejecting initial value: Chain 1: Log probability evaluates to log(0), i.e. negative infinity. Chain 1: Stan can't start sampling from this initial value. ... Chain 1: Chain 1: Initialization between (-2, 2) failed after 100 attempts. Chain 1: Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.  "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed." error occurred during calling the sampler; sampling not done Error in check_stanfit(stanfit) : Invalid stanfit object produced please report bug
It seems that the more predictors there are the more likely this is to happen.
I haven’t come across this problem when working with logistic regression (using a “logit” link function in the above).
One way I can work around this is by setting a flat prior, which seems to converge.
fit <- rstanarm::stan_glm(y ~ a + b + c + d + e, family=binomial(link="probit"), prior=NULL, data=fake_data)
Another way to get it to converge is by setting a slightly tighter prior (which gives very similar estimates):
fit <- rstanarm::stan_glm(y ~ a + b + c + d + e, family=binomial(link="probit"), prior=rstanarm::normal(scale=1, autoscale=TRUE), data=fake_data)
I don’t understand how Stan works, but I’m guessing as we add more predictors there’s a greater chance of getting one that is far from it’s actual value (in this case 0).
I suspect because the probabilities in the normal distribution go to zero much faster than in the logistic distribution, this leads to numerical errors in
link="probit" that don’t occur in
Is there a better heuristic for a weakly informative prior for probit regression?
I’ve found in an example using
prior=rstanarm::normal(scale=0.5, autoscale=TRUE), seems to work and give similar results to a flat prior, but I don’t know whether that’s a reasonable thing to do in general.