The trouble with parameter estimation in Cognitive Diagnostic Models (CDMs)

Kentaro · June 11, 2023, 1:28am

I am currently studying psychostatistics at postgraduate level.
I am using Stan to perform parameter estimation in cognitive diagnostic models (CDM), but I am having difficulty with parameter recovery in simulations and would very much appreciate your advice.

The simulation code and stan code are as follows.

rm(list = ls())
library(tidyverse)
library(rstan)
library(cmdstanr)

seed = 1234
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))

set.seed(seed)
n <- 500

#### advance preparation ####
N <- n #sample size
Q <- matrix(
     c(1,0,0,0,0, 
       0,1,0,0,0, 
       0,0,1,0,0, 
       0,0,0,1,0,
       0,0,0,0,1,
       1,1,0,0,0,
       0,1,1,0,0,
       0,0,1,1,0,
       0,0,0,1,1,
       0,1,0,1,1,
       1,1,1,0,0,
       1,0,0,1,1,
       1,0,1,1,1,
       1,1,1,0,1,
       1,1,1,1,1),
     nrow=15, ncol=5, byrow=TRUE) #Qmatrix
J <- nrow(Q) #Number of items; determined by the number of rows in the Q matrix.
K <- ncol(Q) #Number of attributes, determined by the number of columns in the Q matrix.
L <- 1 #L=1 because this time there is only one ability parameter.

#### ability parameter θ ####
theta <- rnorm(N, 0, 1) #Generated from standard normal distribution
#####

#### Attribute acquisition parameters ####
#Set up following previous studies
#The aim of this simulation is to recover these values
lambda <- rep(1.5, 5)
beta <- c(-1.0, -0.5, 0.0, 0.5, 1.0)
#####

#### item parameters guess, slip ####
#Slip, Guess all fixed at 0.1
guess <- rep(0.1, J)
slip <- rep(0.1, J)

#####

#### Attribute acquisition probability ####
delta <- matrix(nrow = N, ncol = K)
for (k in 1:K) {
  delta[,k] <- exp(lambda[k] * theta - beta[k]) / (1 + exp(lambda[k] * theta - beta[k]))
}
#####

#### item response probability ####
p <- matrix(nrow = N, ncol = J)
for (j in 1:J) {
  p[,j] <- apply(delta, 1, function(x) {
    return (guess[j] + (1-slip[j]-guess[j])*prod(x ** Q[j, ]))
  }
  )
}
#####

#### Item response (observed) ####
Y <- matrix(nrow = N, ncol = J)
for (j in 1:J) {
  for (n in 1:N) {
    Y[n,j] <- rbinom(1, 1, p[n, j])
  }
}
#####

#### estimate ####
datastan <- list(N=N, J=J, K=K, Q=t(as.matrix(Q)), Y=t(as.matrix(Y)))
model <- cmdstan_model("simu_highorder_stan_0610.stan")
fit1 <- model$sample(data=datastan, iter_warmup=1000, iter_sampling=1000, chains=4, parallel_chains=4, refresh=200, seed = seed)

#### result ####
fit1$print('beta')
fit1$print('lambda')

#### result ####
fit1$print('s')
fit1$print('g')

data {
  int<lower=0> N;
  int<lower=0> J;
  int<lower=0> K;
  matrix[K,J] Q;
  array[J, N] int Y;
}

transformed data {
  vector[N] Ones = rep_vector(-1, N);
}

parameters {
  vector<lower=0>[K] lambda;
  vector[K] beta;
  
  vector<lower=0, upper=1>[J] s;
  vector<lower=0, upper=1>[J] g_base;
  vector[N] theta;
    
  real<lower=0> sigma_lam;
  real<lower=0> sigma_beta;
}

transformed parameters {
  vector<lower=0, upper=1>[J] g;
  for (j in 1:J) {
    if (g_base[j] > 1-s[j]) {
      g[j] = 1-s[j]-1e-10;
    } else {
      g[j] = g_base[j];
    }
  }
  
  matrix[N, 2] X = append_col(theta, Ones);
  matrix[K, 2] b = append_col(lambda, beta);
  
  matrix[N, K] delta;
  matrix[N, K] X_bT = X * b';
  delta = inv_logit(X_bT);

  matrix[N, J] eta;
  eta = exp(log(delta + 1e-8)*Q);

  matrix[N,J] p;
  for (n in 1:N) {
    for (j in 1:J) {
      p[n, j] = g[j] + (1 - s[j] - g[j]) * eta[n, j];
    }
  }
}

model {
  sigma_lam ~ lognormal(0, 0.5);
  sigma_beta ~ lognormal(0, 0.5);
  lambda ~ lognormal(0, sigma_lam);
  beta ~ normal(0, sigma_beta);

  theta ~ normal(0, 1);

  s ~ beta(2, 5);
  g_base ~ beta(2, 5);
  
  for (j in 1:J) {
      Y[j,] ~ bernoulli_logit(p[,j]);
  }
}

The data generation model is based on the “HO-PINC Model” from this paper.

Zhan et_al(2018)

In my simulations, the data generation model is as follows:

δ_{nk} = \frac{\exp(\lambda_{k} \theta_{n} - \beta_{k})}{1+\exp(\lambda_{k} \theta_{n} - \beta_{k})}
- θ[N]: individual ability (from a standard normal distribution)
- δ[K]: probability of mastering each attribute (k)
- β[K]: intercept in the probability of learning each attribute (k)
- λ[K]: slope with respect to θ in the probability of mastering each attribute (k)
- δ is a probability value, so it takes values between 0 and 1.
\eta_{nj} = \prod_{k=1}^K \delta_{nk}^{q_{ik}}
- q[N,J]: Q matrix values (binary variables)
p_{nj} = g_{j} + (1-s_{j}-g_{j})\eta_{nj}
- slip[J]:slip parameter in the probability of correct response for each item (j)
- guess[J]:guess parameter of the probability of correct answer for each item (j)
- slip and guess are probability values and therefore take values between 0 and 1.
- The constraint “g_{j} < 1-s_{j}” is entered.
Y_{nj} ～ Bernoulli(p_{nj})
- Y_{nj}: Individual item responses (binary variables)

In the aforementioned simulation, N=500, K=5 and J=15.

What we want to estimate this time are β and λ. The true values of these are as follows.

β=(-1.0, -0.5, 0.0, 0.5, 1.0)
λ=(1.5, 1.5, 1.5, 1.5, 1.5)

The estimation results are as follows, where β deviates significantly from the true value.The indicators suggest that there is convergence, but the estimates are far off.
I have changed the seed and run the simulation, but the result is the same, and I am trying to figure out why, but have not been able to solve the problem.

result

I am very troubled.Please give me some advice on this problem.
Any advice on modifications to the code, the theory behind it, etc. would be appreciated.
Thank you in advance.

Topic		Replies	Views
Having trouble recovering some model parameters when fitting the simulated data Modeling techniques , fitting-issues , irt	4	1141	June 11, 2022
"Bayesian Cognitive Modeling" in cmdstanr CmdStan cognitive-science	5	1209	September 6, 2021
Estimating parameters of two normal distribution, mixed at proportions Modeling	5	1433	May 10, 2019
Parameter recovery issues with normal mixture model Modeling hierarchical-model , mixture	3	641	June 16, 2022
Recovering parameters of fake model Modeling	1	659	April 27, 2022

The trouble with parameter estimation in Cognitive Diagnostic Models (CDMs)

Related topics