I am trying to estimate the following multinomial IRT model:
\left[ y_{i1} \dots y_{iJ} \right] \sim Multinomial(\pi_{i1} \dots \pi_{iJ}, y_{i+})
where y_{ij} are the counts of word j in document i and y_{i+} is the total length of document i. The event probabilities are given by
\pi_{ij} = \frac{exp(\psi_{j} + \theta_i \beta_{j})}{\sum_1^J exp(\psi_{j} + \theta_i \beta_{j}) }
where \psi_j are document fixed effects, \beta_j are the ideological positions of words and \theta_i the ideological direction of documents.
These type of models, used especially in political science to recover the ideological position of documents, are known to have identification problems. First, I solve the problem of multiplicative aliasing by setting a unit prior on my latent parameter of interest \theta. Second, I solve the problem of reflectoin invariance as suggested in this post. As I understand, these two conditions should be sufficient to solve the identification problem. However, I have tested my model with simulated data and I am getting very strange results.
This is the model I am using:
data {
int<lower=1> I; // Number of documents
int<lower=1> J; // Number of features
int<lower=1> N; // Number of counts (I * J in most cases)
int<lower=0> counts[N]; // Long vector of counts
int<lower=1,upper=I> docs[N]; // Long document index [1 1 1 ... I I I]
int<lower=1,upper=J> word[N]; // Long feature index [1 ... J 1 ... J]
int<lower=1,upper=I> neg_doc;
int<lower=1,upper=I> pos_doc; // Index for reference documents such that theta[neg_doc] < theta[pos_doc]
}
parameters {
vector<lower=0>[I] theta; // Ideological direction of document
vector<lower=0>[J] beta; // Ideological direction of word
vector<lower=0>[J] psi; // Feature fixed effects
}
transformed parameters{ // Establish first word as a baseline (Lowe, 2016)
vector<lower=0,upper=1>[N] Pi; // Long vector of J-simplexes
for (i in 1:I){
Pi[J*(i-1)+1:J*i] = softmax(psi + theta[i] * beta); // https://mc-stan.org/docs/2_18/functions-reference/softmax.html
}
}
model {
psi ~ normal(rep_vector(0,J),rep_vector(10,J)); // Non-informative prior
beta ~ normal(rep_vector(0,J),rep_vector(3,J)); // Regularization as in quanteda
theta ~ normal(rep_vector(0,I),rep_vector(1,I)); // Scale identification with unit normal prior
for (i in 1:I){
counts[J*(i-1)+1:J*i] ~ multinomial(Pi[J*(i-1)+1:J*i]);
}
}
generated quantities {
// Scale identification (reflection invariance)
vector[I] theta_fix = theta[pos_doc] < theta[neg_doc] ? theta : theta * -1;
vector[J] beta_fix = theta[pos_doc] < theta[neg_doc] ? beta : beta * -1;
}
This is the code I am using to generate my simulated data
import numpy as np
import random
from scipy.special import softmax
np.random.seed(1)
I = 5
J = 10
N = J * I
C = np.random.poisson(lam = 100, size = I)
theta = np.random.normal(size = I)
theta = np.expand_dims(theta,axis = -1)
beta = np.random.normal(scale = 3, size = J)
beta = np.expand_dims(beta,axis = -1)
psi = np.random.normal(scale = 3, size = J)
psi = np.expand_dims(psi,axis = -1)
pi = softmax(np.transpose(psi) + np.multiply(theta,np.transpose(beta)))
DTM = np.empty((I,J))
for i in range(0,I):
DTM[i,:] = np.random.multinomial(C[i],pi[i,:])
Finally, a comparison between the real values of \theta:
0.3190391,-0.24937038, 1.46210794, -2.06014071, -0.3224172
and the model’s fit
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
theta[1] 1.79 3.2e-3 0.39 1.16 1.52 1.75 2.01 2.67 14692 1.0
theta[2] 1.78 2.8e-3 0.39 1.15 1.5 1.73 2.0 2.65 18698 1.0
theta[3] 0.1 6.8e-4 0.1 2.7e-3 0.03 0.07 0.15 0.38 23224 1.0
theta[4] 2.65 0.03 0.89 0.79 2.33 2.84 3.25 4.03 1193 1.0
theta[5] 1.38 2.4e-3 0.28 0.89 1.19 1.36 1.55 1.98 12823 1.0
Any help or comment are appreciated, as I am not very familiar with Bayesian inference.