Identification in latent variable model

Hi everyone,

I have what I think is a relatively straightforward IRT model that I’m having difficulty getting to converge. I identify the model by ordering two parameters zeta_j to avoid reflection invariance, and by fixing the scale by placing a standard normal prior on zeta. I nevertheless end up with a small number of chains that don’t converge (maybe 1 out of 8 chains). I’ve tried a bunch of strategies, including setting redundant identification constraints, dropping the constant lamda and thus letting the zeta_j and theta_i scale themselves, setting hard constraints by fixing two zeta_j to constants (e.g. -1 and +1), and tightening up priors (maybe not enough?). When I initialize the parameters in reasonable ways, I don’t have any problems however. I’m alright with fitting the model with just these initializations, but if there’s a better way, it’d be great to know! Thanks!

IRT.stan (1.4 KB)

Hi, what do you exactly mean by “not converge” - do you get divergent transitions? Or do the chains explore a completely different part of the parameter space than the others?

It is also possible that the model is fine, but the data do not contain enough information to identify the model (while different datasets could). Have you tested the model with simulated data?

You might also try to specify your model with brms - it should support some IRT models (not sure if yours is covered though). The advantage being that brms is well tested so if the model does not converge, you can be quite certain that it is not an implementation problem. Also if you can express a similar model in brms, you can use the Stan code brms generates as a starting point.

Also, unrelated to your problems, some minor points to improve your Stan code:

instead of

  for(t in 1:n_users)
    theta[t] ~ normal(theta_mu[group[t]], theta_sigma[group[t]]);

you can have simply:

theta ~ normal(theta_mu[group], theta_sigma[group]);

as Stan lets you index by an integer array

and instead of

  for(i in 1:n_users) Y[i, 1:n_domains] ~ neg_binomial_2(C[i, 1:n_domains], omega[i]);

you can use

  for(i in 1:n_users) Y[i, ] ~ neg_binomial_2(C[i, ], omega[i]);

as indexing by the whole range is implicit and you can avoid repeating 1:n_domains.

This doesn’t look at all like an ordinary IRT model. You might want to also check out all the tutorials on IRT in our case studies and in the recent StanCon 2018 case studies.

I don’t understand why you think you can order two parameters. Do you know they should always be strictly ordered that way in the posterior?

The real problem you’re having with convergence is that you’re using the centered parameterization for a hierarchical model when you should probably using the non-centered parameterization. See the manual or the various case studies.

Also, you can use append_row to create zeta in a one-liner.

vector[n_domains] zeta = append_row(zeta_pair, zeta_raw);

We also have a neg_binomial_2_log that takes the scale on the log scale so you don’t have to exponentiate C.

Thanks Martin and Bob!

Bob, I order those two parameters to identify the model because multiplying the parameters zeta and theta by -1 will result in the same relative distances between items (zeta and theta just represent spatial positions on a unidimensional scale) and I’ll get a bimodal posterior. I’m essentially arbitrarily choosing the direction of the scale, such that one item zeta[1] in the latent space that I’m trying to model is always larger than zeta[2]. I’ve tried identifying the model in alternative ways, such as fixing zeta[1] to +1 and zeta[2] to -1, and constraining the sign of one (or more) of the zeta. (http://www.stat.columbia.edu/~gelman/research/published/171.pdf, pp. 176-178) In any case, I’ve tried a variety of ways of doing this, and I still sometimes get a chain that explores the “wrong” part of the parameter space and doesn’t break out even with long warm-ups. With reasonable initializations, it all works as I suspect it should and the results make a lot of sense.

The .stan file is the centered version, but I have a non-centered version, which unfortunately still runs into the same problem.

Martin, I’m going to trying simulating the DGP to make sure I’ve coded this correctly. I usually do that when coding a model in Stan, but I skipped it this time, and perhaps I shouldn’t have. I’ll respond to this thread with the results.

Martin and Bob, thanks also for pointing out the ways to clean up the code. I’ll implement those changes to clean this up.

I see. So that’s more like the ideal point models than like the IRT models, where you can usually assume that people with higher ability are more likely to get a question right.

You need to non-center the gamma distribution parameters, too, which is a bit trickier. Or use a different kind of prior there.