I noticed a mistake in my code (constraining my parameters to have a lower bound) and a couple of redundancies that I had to change (see code below). However, my problem still persists, ie. there is a huge difference between the actual values of the simulated parameter \theta and the estimates found by Stan.
I have some candidates to explain the bad performance of the model and the attempts at solving them:
-
Identification. So far, I deal with identification by setting a very tight prior for \theta (my parameter of interest) and pinning down the first value of \psi to zero (see this post). I also deal with reflection invariance as mentioned in this post.
-
Priors of the other two parameters are too vague. I don’t think this is the case, as I have played around with them and results do not seem to change.
-
NUTS sampler and multimodality. This is where my lack of understanding of the topic will manifest, but I hope I can make it clear. My observation comes from the fact that when I check the fit of the model, the R_hat for my unconstrained parameters is way above zero, while this is not the case for constrained and generated quantities.
I provide also a plot of the trace, it surprises me that the distributions of some of the \psi is so wide:
This is the model:
data {
int<lower=1> I; // Number of documents
int<lower=1> J; // Number of features
int<lower=1> N; // Number of counts (I * J in most cases)
int<lower=0> counts[N]; // Long vector of counts
int<lower=1,upper=I> docs[N]; // Long document index [1 1 1 ... I I I]
int<lower=1,upper=J> word[N]; // Long feature index [1 ... J 1 ... J]
int<lower=1,upper=I> neg_doc;
int<lower=1,upper=I> pos_doc; // Index for reference documents such that theta[neg_doc] < theta[pos_doc]
}
parameters {
vector[I] theta; // Ideological direction of document
vector[J] beta; // Ideological direction of word
vector[J-1] psi; // Word fixed effects
}
transformed parameters{ // Establish first word as a baseline (Lowe, 2016)
vector[J] psi_constrained;
psi_constrained[1] = 0;
psi_constrained[2:J] = psi;
}
model {
psi ~ normal(0,5); // Non-informative prior
beta ~ normal(0,3); // Regularization as in quanteda
theta ~ normal(0,1); // Scale identification with unit normal prior
for (i in 1:I){
counts[J*(i-1)+1:J*i] ~ multinomial(softmax(psi_constrained + theta[i] * beta)); // https://mc-stan.org/docs/2_18/functions-reference/softmax.html);
}
}
generated quantities {
// Scale identification (reflection invariance)
vector[I] theta_fix = theta[pos_doc] < theta[neg_doc] ? theta : theta * -1;
vector[J] beta_fix = theta[pos_doc] < theta[neg_doc] ? beta : beta * -1;
}