Hello all,

As part of my general efforts to understand how initialization (among many other things) really works under the hood in Stan, I wrote the following toy application (partially derived from the vignette of the edstan package):

```
library(rstan)
library(edstan)
set.seed(1)
## Begin code from edstan vignette
# Set parameters for the simulated data
I <- 20
J <- 500
sigma <- .8
lambda <- c(-10*.05, .05, .5, -.025)
w_2 <- rnorm(J, 10, 5)
w_3 <- rbinom(J, 1, .5)
W <- cbind(1, w_2, w_3, w_2*w_3)
beta_free <- seq(from = -1, to = 1, length.out = I-1)
# Calculate or sample remaining variables and parameters
N <- I*J
ii <- rep(1:I, times = J)
jj <- rep(1:J, each = I)
beta <- c(beta_free, -1 * sum(beta_free))
rasch_theta <- rnorm(J, W %*% matrix(lambda), sigma)
rasch_eta <- (rasch_theta[jj] - beta[ii])
rasch_y <- rbinom(N, size = 1, prob = plogis(rasch_eta))
# Assemble the data list using an edstan function
sim_rasch_list <- irt_data(y = rasch_y, ii = ii, jj = jj,
covariates = as.data.frame(W),
formula = NULL)
## End vignette code
twopl_code <- "
data {
int<lower=1> I; // # questions
int<lower=1> J; // # persons
int<lower=1> N; // # observations
int<lower=1, upper=I> ii[N]; // question for n
int<lower=1, upper=J> jj[N]; // person for n
int<lower=0, upper=1> y[N]; // correctness for n
int<lower=-1, upper=1> polarity[I];
}
parameters {
vector[I] diff;
vector[I] disc_free;
vector<upper=0>[I] disc_neg;
vector<lower=0>[I] disc_pos;
vector[J] theta_raw;
}
transformed parameters {
vector[J] theta;
real mean_theta_raw;
real<lower=0> sd_theta_raw;
vector[I] disc;
// Identify location and scale
mean_theta_raw = mean(theta_raw);
sd_theta_raw = sd(theta_raw);
theta = (theta_raw - mean_theta_raw) ./ sd_theta_raw;
for (i in 1:I) {
if (polarity[i] < 0) {
disc[i] = disc_neg[i];
}
if (polarity[i] == 0) {
disc[i] = disc_free[i];
}
if (polarity[i] > 0) {
disc[i] = disc_pos[i];
}
}
}
model {
diff ~ normal(0, 1);
disc_free ~ normal(0, 1);
disc_neg ~ normal(0, 1);
disc_pos ~ normal(0, 1);
theta_raw ~ normal(0, 1);
//y ~ bernoulli_logit((disc[ii] .* theta[jj] - diff[ii]) * 1.7);
y ~ bernoulli(Phi_approx(disc[ii] .* theta[jj] - diff[ii]));
}
"
sim_twopl_list <- within(sim_rasch_list, {
polarity = rep(c(1, -1, 0, 0), length.out = I)
y = ifelse(polarity[ii] < 0, 1 - y, y)
})
sim_twopl_fit <- stan(model_code = twopl_code,
data = sim_twopl_list,
chains = 2,
iter = 1000,
seed = 3)
print(sim_twopl_fit, c("lp__", "disc"))
```

If I run the code as written above, it prints out a number of `Rejecting initial value`

warnings before it finally initializes properly. If, however, I substitute the commented out `y ~ bernoulli_logit((disc[ii] .* theta[jj] - diff[ii]) * 1.7)`

in the Stan code for the line that follows it, initialization proceeds without any problems. (This also happens with different random seeds.) Since I have encountered initialization problems with similar models that also employ a normal (instead of logistic) link function, I am wondering whether this is a general pattern and, if so, why is the logistic superior? Again, I am posing this question mainly to enhance my general understanding of initialization in Stan (which has been a continuing source of frustration for me), not because I care about this particular model. Any insight into these matters would be much appreciated. Thanks!

Devin