I’ve been working through Exercise 15.5 of *Regression and Other Stories*, which asks to take a logit model and convert it to a probit.

I found when trying to fit models with more than a few predictors, without setting the prior the sampler will fail.

Is there a better default weakly informative prior?

I’m running `rstanarm`

version 2.21.1 on Ubuntu-20.04 (via WSL2).

To demonstrate here’s an example where we’re trying to fit a random binomial variable on 5 noise predictors with 1000 data points:

```
N <- 1000
fake_data <- data.frame(a=rbinom(N,1,0.5),
b=rbinom(N,1,0.5),
c=rbinom(N,1,0.5),
d=rbinom(N,1,0.5),
e=rbinom(N,1,0.5),
y=rbinom(N,1,0.5))
fit <- rstanarm::stan_glm(y ~ a + b + c + d + e,
family=binomial(link="probit"),
data=fake_data)
```

Most of the time running this I will get an error like:

```
SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 1).
Chain 1: Rejecting initial value:
Chain 1: Log probability evaluates to log(0), i.e. negative infinity.
Chain 1: Stan can't start sampling from this initial value.
...
Chain 1:
Chain 1: Initialization between (-2, 2) failed after 100 attempts.
Chain 1: Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."
error occurred during calling the sampler; sampling not done
Error in check_stanfit(stanfit) :
Invalid stanfit object produced please report bug
```

It seems that the more predictors there are the more likely this is to happen.

I haven’t come across this problem when working with logistic regression (using a “logit” link function in the above).

One way I can work around this is by setting a flat prior, which seems to converge.

```
fit <- rstanarm::stan_glm(y ~ a + b + c + d + e,
family=binomial(link="probit"),
prior=NULL,
data=fake_data)
```

Another way to get it to converge is by setting a slightly tighter prior (which gives very similar estimates):

```
fit <- rstanarm::stan_glm(y ~ a + b + c + d + e,
family=binomial(link="probit"),
prior=rstanarm::normal(scale=1, autoscale=TRUE),
data=fake_data)
```

I don’t understand how Stan works, but I’m guessing as we add more predictors there’s a greater chance of getting one that is far from it’s actual value (in this case 0).

I suspect because the probabilities in the normal distribution go to zero much faster than in the logistic distribution, this leads to numerical errors in `link="probit"`

that don’t occur in `link="logit"`

.

Is there a better heuristic for a weakly informative prior for probit regression?

I’ve found in an example using `prior=rstanarm::normal(scale=0.5, autoscale=TRUE),`

seems to work and give similar results to a flat prior, but I don’t know whether that’s a reasonable thing to do in general.