Poisson count model indeterminacy problems

Currently I am trying to estimate the following model, but i seem to be having determinacy issues between my beta and gamma parameters. The overall question being answered here is i want to assess the beta value which is essentially the difficulty of an individual item on its own (like item content), and gamma, the difficulty associated with item type. So for example, items 1-4 may be multiple choice, 5-8 could be open ended, etc. The data is count data, i.e. something like number of actions taken by an examinee during an item. So one gamma value is being applied to multiple items. I have tried different constraints and summing to 0 but I cannot seem to get the parameters to match my true generating parameters. They do get pretty close but sometimes the beta and gamma values deviate more than desired. I have never had an issue with theta as it always seems to estimate closely to my true thetas. Theta is essentially my intercept, and i would like to keep beta and gamma as independent fixed effects. Does anyone have any suggestions?

data {
  int<lower=0> N; // number of persons
  int<lower=0> J; // number of items
  int<lower=0> T; // number of item types
  int<lower=0> Y[N, J]; // responses
  int<lower=1,upper=T> item_type[J]; // item type for each item, so it could be (1,1,1,1,2,2,2,2,3,3,3,3) for example
}

parameters {
  vector[N] theta_raw;
  real <lower=0> sd_theta;


  vector[T] gamma;
  
  vector[J] beta_raw;
  real<lower=0> sd_beta;

}

transformed parameters {
  vector[J] beta;
  vector[N] theta;

  beta = beta_raw * sd_beta;
  theta = theta_raw * sd_theta;

}

model {
  // Priors
  theta_raw ~ normal(0, 1);
  sd_theta ~ lognormal(0, 1);

  beta_raw ~ normal(0, 1);
  sd_beta ~ lognormal(0, 1);
  
  gamma ~ normal(0,1);

  matrix[N, J] lambdas;
  for (j in 1:J) {
    lambdas[,j] = exp(theta - beta[j] - gamma[item_type[j]]);
    target += poisson_lpmf(Y[,j] | lambdas[,j]);
  }
}

If interested the model in question is a mix between a rasch poisson count model and a faceted rasch model.

faceted rasch model
logit(P_{jik})= \theta_j - \beta_i - \gamma_k

Where P_{jik} is the probability of correct response by person j to item i at position k. Higher values in are associated with lower probabilities of a correct response. For this reason, we refer to \gamma_k as difficulty for position k. In this model, position facet is assumed to be independent of item difficulty \beta_i, That is, the position effect is considered uniform across all items administered at a particular position. I am trying to use this model and instead of item position, substitute item type, and sample from the poisson distribution as my data is count data.

The original rasch poisson counts model
\begin{equation} \lambda = \exp(\theta - \beta ) \end{equation}

The end results which is in my STAN model.
\begin{align*} \lambda_{ij} &= \exp(\theta_i - \beta_j - \gamma_{\text{{item_type}}_j}) \end{align*}

It seems strange to me that responses of different item types are all modeled as Poisson counts. Why not model them with different distributions? For instance, Bernoulli distribution for multiple-choice items, and ordered logit distribution for rating scale items.

You can then use if-else statements to switch between the distributions for different item types:

model {
  matrix[N, J] mu;
  for ( j in 1:J ){
    mu[,j] = theta - beta[j] - gamma[item_type[j]];

    if ( item_type[j] == 1 )  // multiple choice
       Y[,j] ~ bernoulli_logit(mu[,j]);
    else if ( item_type[j] == 2 )  // rating scale
       Y[,j] ~ ordered_logistic(mu[,j], kappa);
    // ...
  }
}

Thank you for the response! I should have been more clear in my original post, the data would be count data for all item types. Most likely something like # of actions, such as clicks on the screen, etc for each item.

Could you provide more descriptions of your data-generating process (DGP)? It would be nice if there is simulation code and, ideally, a Directed Acyclic Graph (DAG) sketching the generative process.

My Assumptions

I’ve imagined some DGPs based on your description, shown below in the DAG. Here R represents the response (count data in your case), A the person ability, D the item difficulty, and T the item type.

In the most basic form of IRT models, A and D are assumed to be the only two forces that directly influence the outcome R. There may be other influences on R as well, but they should only be indirect, acting through the influences of A or D. This is, for instance, illustrated in the solid arrows in the DAG below. Here, the item type T influences R only through D.

If this was your DGP for the simulation and you specified the model as shown above, it would be reasonable that the model couldn’t recover the effects of T on D. D carries all the information of T, which essentially blocks the model from seeing T.

Alternatively, if the DGP includes the dashed arrow T \rightarrow R, the IRT model could then be extended to measure the so-called differential item functioning. When the item type T is included as a predictor in the linear model, the model would estimate the direct effect of T on R, in addition to the direct effect of D on R. If the effect T \rightarrow R is non-trivial, there will be good reasons to suspect that item type T has an additional (and undesirable) influence on R, which bypasses the acceptable influence through D.

To be short, it is necessary to explicate the assumptions underlying your simulation to tackle the problem here. I’ve worked on a model[1] that has a highly similar structure to yours (if I’m understanding the problem here correctly). And even if everything mentioned above is sorted out, there can still be identifiability issues, which is very common for fitting IRT models in Stan.


  1. If interested, you can take a look at the documentation, the R script, and the Stan file for simulation and fitting. It might provide hints about the problems you’re facing. ↩︎

Thank you for the response. Here is my data generation code.

library(rstan)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

gen_test_data <- function(true_difficulties, true_abilities, item_type_difficulties, n_persons, item_types) {
  out <- data.frame(
    item1 = numeric(n_persons)
  )
  for (j in 1:length(true_difficulties)) {
    lambdas <- exp(true_abilities - true_difficulties[j] - item_type_difficulties[item_types[j]])
    out[[paste0("item", j)]] <- rpois(n_persons, lambdas)
  }

  return(out)
}

nrep <- 1
N <- 400  # number of persons
J <- 12 #number of items
T <- J / 4#number of item types
stan_mod = stan_model("rpcm_try.stan")

for (x in 1:nrep){
  print(paste0("working on rep: ", x))
  start_time <- Sys.time()

  seed <- x*400
  set.seed(seed)

  ability <- rnorm(N, 0, 1)  # person ability

  # Item characteristics
  item_difficulty <- rnorm(J, 0, 1)  # item difficulty
  item_type_difficulty <- rnorm(T, 0, 1) # item type difficulty

  item_type <- rep(1:T, each = 4) # item type for each item

  # Generate data
  data <- gen_test_data(item_difficulty, ability, item_type_difficulty, N, item_type)

  rpcm <- sampling(stan_mod, data = list(N = N,
                                         J = J,
                                         T = T,
                                         Y = data,
                                         item_type = item_type), iter = 3500)
  end_time <- Sys.time()
}

I want to be able to determine for any given item, how large the effect of item type was and the effect of the item difficulty itself(like item content). I could ignore the indeterminacy problems because the sum of the two already works well in my model, but then I would not be able to determine the unique sources of the total difficulty of the item. It sounds like using DIF would potentially answer this problem, and instead of traditional measures like gender, item type would be used instead.

I think you may try imposing either a sum-to-zero constraint on the item type difficulties or setting one of them as zero (reference) and parameterizing the remaining T-1 type difficulties as relative to the reference.

The code below demonstrates constraining \gamma_{T=1} to zero.

parameters {
  // ...
  vector[T-1] gamma_raw;   // constrained gamma
}
transformed parameters {
  // First gamma fixed to zero
  vector[T] gamma;
  gamma = append_row(0, gamma_raw);
}

You may need to change the simulation code accordingly to allow perfect matching of simulated parameter values to those recovered by the model. Otherwise, the recovered \gamma parameters may differ from the true values by a constant (the estimated and true parameters would be highly correlated).

Apologies here, I have confused the general assumptions of IRT and differential item functioning (DIF). DIF has nothing to do with the current context, as it involves person covariates (e.g., gender) and their interaction with items (i.e., after controlling for the item and person effects, are there any additional effect for an item, stratified by gender, that is different for the two genders).