Fitting a multi-group Rasch Model (IRT)

Jill_de_Ron · February 11, 2020, 9:57am

Dear Stan users,

I would like to fit a multi-group Rasch model in stan. So, I already fitted a simple Rasch model with one group, using a sum-to-zero constraint on the beta parameters. This seems to work fine if I compare it to the output of other R-packages (such as psychomix).

data {
  int<lower=1> I;               // number of items
  int<lower=1> J;               // number of students
  int<lower=1> N;               // total number of observations (student x items)
  int<lower=1, upper=I> ii[N];  // variable indexing the items
  int<lower=1, upper=J> jj[N];  // variable indexing the students
  int<lower=0, upper=1> y[N];   // binary variable outcome 
}

parameters {
  vector[I-1] beta_raw;            // difficulty of item
  vector[J] theta;                 // student ability
  real<lower=0> sigma;   
}

transformed parameters {
  vector[I] beta = append_row(beta_raw, -sum(beta_raw)); // add sum-to-zero constrain
}

model {
  beta_raw ~ normal(0, 3);       // prior on the beta 
  sigma ~ exponential(.1)        // hyperprior
  theta ~ normal(0, sigma);      // prior on the theta
  
  y ~ bernoulli_logit(theta[jj] - beta[ii]); 
}

I now want to extend my model into a multi-group form. I want the groups to be able to differ in mu theta and in beta parameters. And now the sum-to-zero constraint on the beta parameters should be per group. To see if I can recover the parameters I simulated data in R:

library(pcIRT)
sim_data <- simDRM(c(-0.4, 0.2, 0.2), persons = 10000)
data <- sim_data$datmat

sim_data2 <- simDRM(c(-0.8, 0.4, 0.4), persons = 10000)
data2 <- sim_data2$datmat

data_new <- rbind(data, data2)
data_new <- as.data.frame(data_new)
names(data_new) <- c("y1", "y2", "y3")
data_new$X <- 1:nrow(data_new)

# Placing the data in long format and defining a numeric identifier for items 
library(reshape)
long <- reshape::melt(data_new, id.vars = "X", variable.name = "variable", value.name = "value")
long$group <- rep(1:2, each = 10000, times = 3)
key <- 1:length(unique(long$variable))
names(key) <- unique(long$variable)
long$item.id <- key[long$variable]

stan_data <- list(I     = max(long$item.id), 
                  J     = max(long$X), 
                  N     = nrow(long), 
                  K     = 2,
                  ii    = long$item.id, 
                  jj    = long$X, 
                  kk    = long$group,
                  y     = long$value)

My stan model now looks like this:

data {
  int<lower=1> I;                // number of items
  int<lower=1> J;                // number of students
  int<lower=1> N;                // total number of observations (student x items)
  int<lower=1> K;                // number of groups
  int<lower=1, upper=I> ii[N];   // variable indexing the items
  int<lower=1, upper=J> jj[N];   // variable indexing the students
  int<lower=0, upper=K> kk[N];   // variable indexing the group membership
  int<lower=0, upper=1> y[N];    // binary variable outcome
}

parameters {
  vector[K] beta_raw[I-1];         // difficulty of item
  vector[J] theta;                 // student ability
  vector[K] mu;                    // mean theta of every group
  real<lower=0> sigma;             // hyperprior theta
}

transformed parameters {
  vector[K] beta[I];
  
  for (k in 1:K) {
    beta[k] = append_row(beta_raw[k], -sum(beta_raw[k]));
  }
}

model {
  
  for (k in 1:K) {
      beta_raw[k] ~ normal(0, 3);  // prior on the beta 
  }
  
  mu[1] ~ normal(0, 1);            // mu group 1
  mu[2] ~ normal(1, 1);            // mu group 2; to avoid label switching problem
  sigma ~ exponential(.1);         // hyperprior theta
  
  for (n in 1:N) {
    theta[jj[n]] ~ normal(mu[kk[n]], sigma);      // prior on the theta
  }
  
  for (n in 1:N) {
    y[n] ~ bernoulli_logit(theta[jj[n]] - beta[kk[n]][ii[n]]); 
  }
}

Running this program takes incredibly long (it seems to be stuck at 1 / 2000 [ 0%] (Warmup)) so I had to stop it. I tried going for a matrix where I now use vector[K] beta[I];, but then I can’t seem to get the sum-to-zero to constrain working. So, I am a bit stuck. Does anyone see what I am doing wrong?

Also, I know vectorization is preferred in Stan; but if I change the last two loops of my model block, I get an error (“Available argument signatures for operator-: Expression ill performed”)

  theta[jj] ~ normal(mu[kk], sigma);              // prior on the theta
  y ~ bernoulli_logit(theta[jj] - beta[kk][ii]);  // ERROR: Expression is ill performed

Hopefully, somebody can help me to fit my first multi-group Rasch model!

Thanks,
Jill

p-gw · February 11, 2020, 1:24pm

Dear Jill,

For debugging purposes I would suggest you reduce the amount of data you simulate and also possibly run it on a single chain. This makes it way easier to determine errors in the model code.

The most obvious issue with your multi-group model specification is the indexing of beta_raw and beta. If you swap the indexing from vector[K] beta[I] to vector[I] beta[K] you will define the intended array of length K containing a vector of length I. These group-specific item difficulty parameter vectors can then be accessed by beta[k].

Fitting your model with this change will run, however there are still divergent transitions.

In your model code you also write

mu[1] ~ normal(0, 1);            // mu group 1
mu[2] ~ normal(1, 1);            // mu group 2; to avoid label switching problem

I think this non-exchangeable prior on mu is not required here. Label switching generally refers to an issue in mixture models, which is not the case for your model as the grouping variable kk is observed. In my opinion just specifying mu ~ normal(0, 1) should be fine.

For IRT models I have made good experiences using the following optimizations:

non-centered parameterizations for the theta parameters
dropping the sum-to-zero constraint of the beta parameters and instead using the distributional assumption beta ~ normal(0, 1). The issue with sum-to-zero constraints is that a prior distribution on the unconstrained item parameters will result in a non-exchangeable prior distribution for all item parameters. There is a thread on discourse that goes into great detail about soft vs. hard constraints, how they relate to sampling efficiency and how they interact with prior distributions.

Implementing the changes described above results in this model:

data {
  int<lower=1> I;                // number of items
  int<lower=1> J;                // number of students
  int<lower=1> N;                // total number of observations (student x items)
  int<lower=1> K;                // number of groups
  int<lower=1, upper=I> ii[N];   // variable indexing the items
  int<lower=1, upper=J> jj[N];   // variable indexing the students
  int<lower=0, upper=1> y[N];    // binary variable outcome

  int<lower=0, upper=K> kk[J];   // variable indexing the group membership
}

parameters {
  vector[I] beta[K];         // difficulty of item
  vector[J] theta_norm;                 // student ability
  vector[K] mu;                    // mean theta of every group
  real<lower=0> sigma;             // hyperprior theta
}
transformed parameters {
  vector[J] theta;
  theta = mu[kk] + theta_norm * sigma;
}
model {
  for (k in 1:K) {
    beta[k] ~ normal(0, 1);  // prior on the beta
  }

  mu ~ std_normal();
  theta_norm ~ std_normal();
  sigma ~ exponential(.1);         // hyperprior theta

  for (n in 1:N) {
    y[n] ~ bernoulli_logit(theta[jj[n]] - beta[kk[jj[n]], ii[n]]);
  }
}

Please note that there is one change in the data. kk is now a vector of length J. For the simulated data i just took the appropriate subset of your long data, long[seq_len(2 * n), "group"].

library(pcIRT)
n <- 4000

sim_data <- simDRM(c(-0.4, 0.2, 0.2), persons = n)
data <- sim_data$datmat

sim_data2 <- simDRM(c(-0.8, 0.4, 0.4), persons = n)
data2 <- sim_data2$datmat

data_new <- rbind(data, data2)
data_new <- as.data.frame(data_new)
names(data_new) <- c("y1", "y2", "y3")
data_new$X <- 1:nrow(data_new)

# Placing the data in long format and defining a numeric identifier for items
library(reshape)
long <- reshape::melt(data_new, id.vars = "X", variable.name = "variable", value.name = "value")
long$group <- rep(1:2, each = n, times = 3)
key <- 1:length(unique(long$variable))
names(key) <- unique(long$variable)
long$item.id <- key[long$variable]

stan_data <- list(I     = max(long$item.id),
                  J     = max(long$X),
                  N     = nrow(long),
                  K     = 2,
                  ii    = long$item.id,
                  jj    = long$X,
                  kk    = long[seq_len(2 * n), "group"],
                  y     = long$value)

I did not do much validation of the code, but the beta parameters seem to be recovered just fine.

I hope this helps. Best regards,
Philipp

Jill_de_Ron · February 11, 2020, 2:58pm

Dear Philipp,

Thank you so much for your answer, that helps a lot!

Best,
Jill

Topic		Replies	Views
Help with understanding my results Modeling fitting-issues , specification , performance	5	1183	February 8, 2019
Syntax question: Constrain parameters by groups Modeling constraint-transform	1	434	July 10, 2021
Fitting Rasch mixture IRT model Modeling	9	1456	March 2, 2020
Using rstan to estimate Response Times IRT Model parameters with two data (matrices) Modeling cognitive-science	1	643	May 15, 2019
Using rstan to estimate Response Times IRT Model parameters Modeling techniques , cognitive-science	2	593	April 25, 2019

Fitting a multi-group Rasch Model (IRT)

Related topics