Hello!
For some time now I am struggling with estimation of a hierarchical model. I will briefly describe the model here and then provide my Stan code.
Let’s start with the manifest variables in the model. One of the manifest variables is a binary variable Y, representing whether a person has responded correctly to an item. The other manifest variable is a continuous variable t, representing the response time of each person to each item. Besides these two variables, we also have the auxiliary variable X, which represents the ‘time scale’ of each item. This variable X is calculated as (i-1)/k, where i is the position of the current item and k is the total number of items.
The latent variable part of the model rests upon the item response theory models for responses and response times. Usually what you would have is the latent variable theta, which represents respondents’ latent ability (accuracy) and the latent variable tau, which represents the person’s latent speed. Theta is defined so that P(Y = 1) = (e^(theta - b))/(1 + e^(theta - b)), where b is the difficulty of the item. This is the so-called Rasch model. The latent variable tau is defined so that the logarithm of t (the response times) is normally distributed with mean beta - tau and variance alpha, beta being the time intensity parameter of the item, analogous to the item difficulty parameters b.
We extend this model to a latent growth model by replacing theta and tau with their respective growth terms, so instead of theta we would have theta0 + theta1X (where theta0 is the intercept, theta1 is the slope and X is the time scale variable described in the beginning). We do the same for tau, and so tau is replaced by tau0 + tau1X.
Therefore, we have four latent variables (theta0, theta1, tau0 and tau1) for which a multivariate normal distribution is assumed.
To identify the model, we have set all the means of the latent variables to 0 and we have constrained the alpha (variance of the response times) to be the same across all items. Other item parameters (b and beta) vary across items.
For several months already I have been trying to fit this model, but I am constantly running into convergence issues. Specifically, what seems to be the problem are the variance parameters of the latent variables and the correlation matrix of the latent variables. Below I will post my Stan code, my R code that I use to generate simulated data and some traceplots which demonstrate bad convergence. I would really appreciate any help with improving the model. I have already tried most of the suggestions on the reparametrization of the model, so I am all out of ideas and I don’t know what else I could try. Currently I am trying to run the model with really large sample sizes (10000 respondents and 50 items), but I don’t have a feeling that that will actually help.
EDIT: Also what would be maybe important to note is that I use the following arguments for the estimation in Stan:
max_treedepth = 20
adapt_delta = .99
stepsize = .01
Stan code:
data {
int<lower=0> n_person; // number of examinees
int<lower=0> n_item; // number of items
int<lower = 1> Dim; // number of dimensions
int<lower = 1> N; // number of data points
int<lower = 1> jj[N]; // item id
int<lower = 1> ii[N]; // person id
int<lower=0, upper = 1> Y[N];
vector<lower=0>[N] t; // response time data
vector<lower=0, upper=1>[N] X; // time scale variable
vector[Dim] Zero; // vector of 0s for person parameter means
}
parameters {
matrix[n_person, Dim] PersParStar; // provisional variable used in reparametrization of the multivariate normal
cholesky_factor_corr[Dim] choleskyP; // Cholesky decomposition of correlation of person parameters
vector<lower=0>[Dim] sigmaPStar; // provisional variable used in reparametrization of variances
vector[n_item] b; // item difficulty parameters
vector[n_item] beta; // item time intensity parameters
real<lower=0> alphaStar; // common variance of RTs
real<lower=0> Prec1; // provisional precision variable used for the reparametrization of t-distribution for variances
real<lower=0> Prec2; // same as above
}
transformed parameters {
vector<lower=0>[Dim] sigmaP = sigmaPStar/sqrt(Prec1); //variance of person parameters
real<lower=0> alpha = alphaStar/sqrt(Prec2); // variance of response times
matrix[n_person, Dim] PersPar = PersParStar * diag_pre_multiply(sigmaP, choleskyP); //person parameters
}
model {
to_vector(PersParStar) ~ std_normal();
Prec1 ~ gamma(1.5, 1.5);
Prec2 ~ gamma(1.5, 1.5);
sigmaPStar ~ std_normal();
alphaStar ~ std_normal();
choleskyP ~ lkj_corr_cholesky(1);
b ~ normal(0, 2);
beta ~ normal(0, 2);
target += bernoulli_logit_lpmf(Y | (PersPar[ii,1] + PersPar[ii,2] .* X) - b[jj]);
target += lognormal_lpdf(t | beta[jj] - (PersPar[ii,3] + PersPar[ii,4] .* X), alpha);
}
R code for data generation:
library(tidyverse)
library(faux)
library(rethinking)
library(extraDistr)
set.seed(172)
n_item <- 20
n_person <- 500
sigmaP <- c(1, .4, 1, .5)
correlP <- c(1, .2, .5, -.2,
.2, 1, -.1, -.3,
.5, -.1, 1, -.15,
-.2, -.3, -.15, 1)
person_pars <- rnorm_multi(n = n_person,
mu = rep(0, 4),
sd = sigmaP,
r = correlP,
varnames = c("Theta0", "Theta1", "Tau0", "Tau1"))
theta0 <- person_pars$Theta0
theta1 <- person_pars$Theta1
tau0 <- person_pars$Tau0
tau1 <- person_pars$Tau1
b <- rep(seq(-1.5, 1.5, .75), 4)
beta <- rep(seq(-1, 1, .5), each = 4)
alpha <- .3
y <- list()
t <- list()
x <- list()
for (i in 1:n_item) {
x[[i]] <- (i - 1)/n_item
t[[i]] <- rlnorm(n_person, (beta[[i]] - (tau0 + (tau1*x[[i]]))), alpha)
y[[i]] <- rbinom(n_person, 1, ((exp((theta0 + (theta1*x[[i]])) - b[[i]]))/(1 + exp((theta0 + (theta1*x[[i]])) - b[[i]]))))
}
jj <- rep(1:n_item, each = n_person)
ii <- rep(1:n_person, n_item)
data <- list("n_person" = n_person,
"n_item" = n_item,
"N" = n_item*n_person,
"Dim" = 4,
"Y" = unlist(y),
"t" = unlist(t),
"jj" = jj,
"ii" = ii,
"X" = rep(unlist(x), each = n_person),
"Zero" = rep(0, 4))
saveRDS(file = "data_linspeedabil_1.rds", object = data)
Bad traceplots: