I am seeking some advice on avoiding initialization failure.
I am working on a modified version of an item response theory model that, for identification purposes, constrains the signs of some parameters. Specifically, certain elements of a parameter vector beta
(which is similar to the discrimination parameters in a conventional IRT model) are constrained to be positive in order to identify the polarity of the latent scale. To impose this constraint, I use the trick of defining two versions of the vector, beta_pos
and beta_rest
, the first of which has positive support, and assigning each of the constrained elements of beta
a value from beta_pos
instead of beta_rest
. Full code is attached, but here are the relevant sections:
vector<lower=0>[L] beta_pos[1]; // positive spatial coefficients
vector[L] beta_rest[1]; // unconstrained spatial coefficients
and
for (l in 1:L) {
for (d in 1:1) {
if (beta_sign[l, d] > 0) { // contrained to be positive
beta[d][l] = beta_pos[d][l];
}
if (beta_sign[l, d] == 0) { // unconstrained
beta[d][l] = beta_rest[d][l];
}
}
}
beta_sign
is a matrix that indicates whether each element of beta
is constrained to be positive (=+1) or unconstrained (=0). (d indexes latent dimensions, of which there is just 1 in this version of the model.)
Sampling from the model works fine as long as only a single element of beta is constrained. Imposing multiple constraints, however, results in a failed initialization. I have tried reducing init_r
as low as 0.01, but to no avail. I have not yet tried specifying initial values, in part because I am not sure which of the parameters I should do so for.
Any thoughts on why this parameterization does not work when more than one constraint is imposed?
Thank you for your help.
Devin
conjointIRT-1D.stan (2.2 KB)
Long-term the easiest way to handle this is to write models that always initialize successfully when the internal parameters are on -1 to 1 (so log(beta_pos) is on -1 to 1. Not sure what that means in your case since I don’t see the full model or the full error message.
Thanks!
I’m not sure exactly what you mean—perhaps more information on my end will help? To clarify, the error message I received was
Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can't start sampling from this initial value.
repeated 100 times. The full model code was attached to the original email, but I’m pasting it below:
data {
int<lower=1> L; // num. attributes
int<lower=1> I; // num. respondents
int<lower=1> N; // num. observations
int<lower=1, upper=I> ii[N]; // respondent for obs n
int<lower=0, upper=1> choice[N]; // response (0/1)
matrix<lower=0, upper=1>[N, L] XX1; // attributes of profile 1
matrix<lower=0, upper=1>[N, L] XX2; // attributes of profile 2
int<lower=-1, upper=1> beta_sign[L, 1]; // sign constraints on betas
}
transformed data {
matrix<lower=-1,upper=1>[N, L] XX1_minus_XX2; // attribute differences
matrix<lower=0,upper=2>[N, L] XX1_plus_XX2; // attribute sums
XX1_minus_XX2 = XX1 - XX2;
XX1_plus_XX2 = XX1 + XX2;
}
parameters {
vector[L] gamma; // function of valence and spatial coefficients
vector<lower=0>[L] beta_pos[1]; // positive spatial coefficients
vector[L] beta_rest[1]; // unconstrained spatial coefficients
vector[I] theta_raw; // ideal point (pre-normalized)
}
transformed parameters {
vector[N] disc; // discrimination parameter
vector[N] diff; // difficulty parameter
vector[I] theta; // ideal points (normalized)
vector[L] beta[1]; // spatial coefficients
// Identify location and scale
for (i in 1:I) {
theta[i] = (theta_raw[i] - mean(theta_raw)) / sd(theta_raw);
}
// Identify polarity
for (l in 1:L) {
for (d in 1:1) {
if (beta_sign[l, d] > 0) { // contrained to be positive
beta[d][l] = beta_pos[d][l];
}
if (beta_sign[l, d] == 0) { // unconstrained
beta[d][l] = beta_rest[d][l];
}
}
}
// Convert spatial and valence terms into IRT item parameters
for (n in 1:N) {
disc[n] = 2 * XX1_minus_XX2[n, 1:L] * beta[1][1:L];
diff[n] = XX1_minus_XX2[n, 1:L] *
(beta[1][1:L] * XX1_plus_XX2[n, 1:L] * beta[1][1:L] - gamma[1:L]);
}
}
model {
vector[N] zz;
theta_raw ~ normal(0, 1);
for (d in 1:1) {
beta_pos[d] ~ normal(0, 1);
beta_rest[d] ~ normal(0, 1);
}
gamma ~ normal(0, 1);
for (n in 1:N) {
zz[n] = disc[n] * theta[ii[n]] - diff[n];
}
choice ~ bernoulli(Phi_approx(zz));
}
Mind if I just give troubleshooting directions?
- Your model is constructing a program to calculate a target log pmf and its gradient w.r.t. parameters.
- The error message says that at the initial values you calculate the log density (not the gradient) to be log(0) so the algorithm can’t figure out where to go next.
- This (below) is the only statement affecting the log density:
- Internally it’s equivalent to:
if (choice == 0)
target += log(1 - Phi_approx(zz))
if (choice == 1)
target += log(Phi_approx(zz))
- So a good guess is that
Phi_approx(zz)
might be zero resulting in a log(0) or 1 - Phi_approx(zz)
might be zero with the same results.
- For that to be true
zz
must have a large magnitude and either negative sign (so it underflows to zero) or a positive sign so it overflows to 1.
- At this point your model has some data involved and a bunch of indexing so to catch the problem I’d insert this code right before the call to the Bernoulli:
for (i in 1:N) {print("i: ", i, ", zz[, "i, "]: ", zz[i], "\n")}
- When the algorithm starts you should get a nice printout of all the values that contribute to the log-likelihood (and therefore to the log(0) and you can just pipe it to a file and look at it or search for huge values. You can even add an
if
statement to only print when the conditions for making the model fail are met (as in #6 above).
- This procedure is always the same. Your model might have a bug I’m missing, but working backwards from the explicit or implicit
target +=
statements will always get you there.
3 Likes
To follow up on this thread, I was able to solve this initialization problem by using the bernoulli_logit
function, which appears to avoid the numeric overflow that occurs when the CDF transformation (either normal or logistic) is done manually, as in Phi_approx(zz)
.
1 Like