Identification problems: Unable to recover parameters in Multinomial IRT

I noticed a mistake in my code (constraining my parameters to have a lower bound) and a couple of redundancies that I had to change (see code below). However, my problem still persists, ie. there is a huge difference between the actual values of the simulated parameter \theta and the estimates found by Stan.

I have some candidates to explain the bad performance of the model and the attempts at solving them:

  1. Identification. So far, I deal with identification by setting a very tight prior for \theta (my parameter of interest) and pinning down the first value of \psi to zero (see this post). I also deal with reflection invariance as mentioned in this post.

  2. Priors of the other two parameters are too vague. I don’t think this is the case, as I have played around with them and results do not seem to change.

  3. NUTS sampler and multimodality. This is where my lack of understanding of the topic will manifest, but I hope I can make it clear. My observation comes from the fact that when I check the fit of the model, the R_hat for my unconstrained parameters is way above zero, while this is not the case for constrained and generated quantities.

I provide also a plot of the trace, it surprises me that the distributions of some of the \psi is so wide:

This is the model:

data {
  int<lower=1> I; // Number of documents
  int<lower=1> J; // Number of features
  int<lower=1> N; // Number of counts (I * J in most cases)
  int<lower=0> counts[N];       // Long vector of counts
  int<lower=1,upper=I> docs[N]; // Long document index [1 1 1 ... I I I]
  int<lower=1,upper=J> word[N]; // Long feature  index [1 ... J 1 ... J] 
  int<lower=1,upper=I> neg_doc; 
  int<lower=1,upper=I> pos_doc; // Index for reference documents such that theta[neg_doc] < theta[pos_doc]
}
parameters {
  vector[I] theta; // Ideological direction of document
  vector[J] beta;  // Ideological direction of word
  vector[J-1] psi;   // Word fixed effects
}
transformed parameters{ // Establish first word as a baseline (Lowe, 2016)
  vector[J] psi_constrained;
  psi_constrained[1]   = 0;
  psi_constrained[2:J] = psi;
}
model {
  psi ~ normal(0,5);  // Non-informative prior
  beta ~ normal(0,3);  // Regularization as in quanteda 
  theta ~ normal(0,1); // Scale identification with unit normal prior

  for (i in 1:I){
    counts[J*(i-1)+1:J*i] ~ multinomial(softmax(psi_constrained + theta[i] * beta)); // https://mc-stan.org/docs/2_18/functions-reference/softmax.html); 
  }
}
generated quantities {
  // Scale identification (reflection invariance)
  vector[I] theta_fix = theta[pos_doc] < theta[neg_doc] ? theta : theta * -1;
  vector[J] beta_fix  = theta[pos_doc] < theta[neg_doc] ? beta  : beta  * -1;
}