I am seeking some advice on avoiding initialization failure.

I am working on a modified version of an item response theory model that, for identification purposes, constrains the signs of some parameters. Specifically, certain elements of a parameter vector `beta`

(which is similar to the discrimination parameters in a conventional IRT model) are constrained to be positive in order to identify the polarity of the latent scale. To impose this constraint, I use the trick of defining two versions of the vector, `beta_pos`

and `beta_rest`

, the first of which has positive support, and assigning each of the constrained elements of `beta`

a value from `beta_pos`

instead of `beta_rest`

. Full code is attached, but here are the relevant sections:

```
vector<lower=0>[L] beta_pos[1]; // positive spatial coefficients
vector[L] beta_rest[1]; // unconstrained spatial coefficients
```

and

```
for (l in 1:L) {
for (d in 1:1) {
if (beta_sign[l, d] > 0) { // contrained to be positive
beta[d][l] = beta_pos[d][l];
}
if (beta_sign[l, d] == 0) { // unconstrained
beta[d][l] = beta_rest[d][l];
}
}
}
```

`beta_sign`

is a matrix that indicates whether each element of `beta`

is constrained to be positive (=+1) or unconstrained (=0). (d indexes latent dimensions, of which there is just 1 in this version of the model.)

Sampling from the model works fine as long as only a single element of beta is constrained. Imposing multiple constraints, however, results in a failed initialization. I have tried reducing `init_r`

as low as 0.01, but to no avail. I have not yet tried specifying initial values, in part because I am not sure which of the parameters I should do so for.

Any thoughts on why this parameterization does not work when more than one constraint is imposed?

Thank you for your help.

Devin

conjointIRT-1D.stan (2.2 KB)

Long-term the easiest way to handle this is to write models that always initialize successfully when the *internal* parameters are on -1 to 1 (so log(beta_pos) is on -1 to 1. Not sure what that means in your case since I don’t see the full model or the full error message.

Thanks!

I’m not sure exactly what you mean—perhaps more information on my end will help? To clarify, the error message I received was

```
Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can't start sampling from this initial value.
```

repeated 100 times. The full model code was attached to the original email, but I’m pasting it below:

```
data {
int<lower=1> L; // num. attributes
int<lower=1> I; // num. respondents
int<lower=1> N; // num. observations
int<lower=1, upper=I> ii[N]; // respondent for obs n
int<lower=0, upper=1> choice[N]; // response (0/1)
matrix<lower=0, upper=1>[N, L] XX1; // attributes of profile 1
matrix<lower=0, upper=1>[N, L] XX2; // attributes of profile 2
int<lower=-1, upper=1> beta_sign[L, 1]; // sign constraints on betas
}
transformed data {
matrix<lower=-1,upper=1>[N, L] XX1_minus_XX2; // attribute differences
matrix<lower=0,upper=2>[N, L] XX1_plus_XX2; // attribute sums
XX1_minus_XX2 = XX1 - XX2;
XX1_plus_XX2 = XX1 + XX2;
}
parameters {
vector[L] gamma; // function of valence and spatial coefficients
vector<lower=0>[L] beta_pos[1]; // positive spatial coefficients
vector[L] beta_rest[1]; // unconstrained spatial coefficients
vector[I] theta_raw; // ideal point (pre-normalized)
}
transformed parameters {
vector[N] disc; // discrimination parameter
vector[N] diff; // difficulty parameter
vector[I] theta; // ideal points (normalized)
vector[L] beta[1]; // spatial coefficients
// Identify location and scale
for (i in 1:I) {
theta[i] = (theta_raw[i] - mean(theta_raw)) / sd(theta_raw);
}
// Identify polarity
for (l in 1:L) {
for (d in 1:1) {
if (beta_sign[l, d] > 0) { // contrained to be positive
beta[d][l] = beta_pos[d][l];
}
if (beta_sign[l, d] == 0) { // unconstrained
beta[d][l] = beta_rest[d][l];
}
}
}
// Convert spatial and valence terms into IRT item parameters
for (n in 1:N) {
disc[n] = 2 * XX1_minus_XX2[n, 1:L] * beta[1][1:L];
diff[n] = XX1_minus_XX2[n, 1:L] *
(beta[1][1:L] * XX1_plus_XX2[n, 1:L] * beta[1][1:L] - gamma[1:L]);
}
}
model {
vector[N] zz;
theta_raw ~ normal(0, 1);
for (d in 1:1) {
beta_pos[d] ~ normal(0, 1);
beta_rest[d] ~ normal(0, 1);
}
gamma ~ normal(0, 1);
for (n in 1:N) {
zz[n] = disc[n] * theta[ii[n]] - diff[n];
}
choice ~ bernoulli(Phi_approx(zz));
}
```

Mind if I just give troubleshooting directions?

- Your model is constructing a program to calculate a target log pmf and its gradient w.r.t. parameters.
- The error message says that at the initial values you calculate the log density (not the gradient) to be log(0) so the algorithm can’t figure out where to go next.
- This (below) is the only statement affecting the log density:

- Internally it’s equivalent to:

```
if (choice == 0)
target += log(1 - Phi_approx(zz))
if (choice == 1)
target += log(Phi_approx(zz))
```

- So a good guess is that
`Phi_approx(zz)`

might be zero resulting in a log(0) or `1 - Phi_approx(zz)`

might be zero with the same results.
- For that to be true
`zz`

must have a large magnitude and either negative sign (so it underflows to zero) or a positive sign so it overflows to 1.
- At this point your model has some data involved and a bunch of indexing so to catch the problem I’d insert this code right before the call to the Bernoulli:

```
for (i in 1:N) {print("i: ", i, ", zz[, "i, "]: ", zz[i], "\n")}
```

- When the algorithm starts you should get a nice printout of all the values that contribute to the log-likelihood (and therefore to the log(0) and you can just pipe it to a file and look at it or search for huge values. You can even add an
`if`

statement to only print when the conditions for making the model fail are met (as in #6 above).
- This procedure is always the same. Your model might have a bug I’m missing, but working backwards from the explicit or implicit
`target +=`

statements will always get you there.

3 Likes

To follow up on this thread, I was able to solve this initialization problem by using the `bernoulli_logit`

function, which appears to avoid the numeric overflow that occurs when the CDF transformation (either normal or logistic) is done manually, as in `Phi_approx(zz)`

.

1 Like