Non-compensatory IRT model

Dear All,

has anybody experience with specifying non-compensatory IRT models in Stan? If so, could anybody point out example code (or snippets, especially related to the likelihood)?

Thanks a ton!

Best wishes
Christoph

I don’t know about non-compensatory IRT models (or IRT models at all), but there are some other fancier IRT models over here: https://education-stan.github.io/

Hi bbbales2,
thanks for the link! After working through some of the examples provided, I think I can narrow my problem down to identification issues.

Say you have responses to k items. In the compensatory case the likelihood of the IRT model (i.e., the probability of a correct response) is y ~ bernoulli_logit(alpha[ii] .* (ability[jj] - beta[ii]), with [ii] and [jj] being the item and person-indices, respectively.

In the non-compensatory case, however, there are two latent traits (abilities) that are necessary for a correct response, and additional item parameters for each item. So, in a way you have two components in the likelihood, y ~ bernoulli_logit([alpha1[ii] .* (ability1[jj] - beta1[ii])] + [alpha2[ii] .* (ability2[jj] - beta2[ii])].

If I am not mistaken, you could also frame two problem as a two-factor model with cross-loadings. Currently, I am working on such a model specification, but for continuous outcomes. I have a model-specification which ‘runs’, but I did not manage it to converge yet (sampling is awful, although without divergent transitions or treedepth-warnings). I think my main problem is that the model is, due to the cross-loadings (and the similarity of the factor loadings), not identified.

The likelihood of my model looks something like the following:
y = (beta1+beta2) + (lambda1 * trait1) + (lambda2 * trait2) + (e1 + e2);
in Stan I am working with
target += log_sum_exp(normal_lpdf(y | lt1, sigma1[ii]), normal_lpdf(y | lt2, sigma2[ii])), because I have an additive relationship of the components lt1 and lt2, where
lt1 = beta1[ii] + lambda1[ii] .* trait1[jj]; lt2 = beta2[ii] + lambda2[ii] .* trait2[jj].

I am not quite sure if I need the log_sum_exp. My main question, however, relates to (hard or soft) constraints to impose on the beta1, beta2 and lambda1, lambda2 vectors. Are there any such constraints I can introduce to better separate the intercepts and factor loadings?

I hope this was not too confusing. Any hints and tipps appreciated!

Best wishes
Christoph

log_sum_exp(a, b) is log(exp(a) + exp(b))

I think it would be rather unusual here. Can you write out mathematically what term you wan to compute there? Then I can maybe tell if it’s right or not (or ask more questions).

What sort of constraints might you want to impose? (edit: not that it’s a bad idea – just asking so I can think better about it)

Sorry for my late response, I spent the last week trying to set up a presentable model. As of yet, I did not fully suceed.

So the basic idea is to develop a non-compensatory response time model. The underyling model is the three-parameter lognormal model by Klein-Entink et al. (2009): log(RT) = lambda - phi*ksi + epsilon, where lambda is a time intensity parameter of an item, phi is a speed sensitivity parameter of an item, and ksi is the speed parameter of a person. epsilon is the item-specific residual variance parameter. The model is equivalent to a tau-congeneric confirmatory factor model for log response times. In the non-compensatory case, you would have two time intensity and speed sensitivity parameters per item, and two speed parameters per person. This means that the time it takes to respond to an item depends on two ability components; doing well on one dimension does not compensat for being bad at the other.

I simulated some data and set up a model where the two response time components are actual data. This works nicely and confirms the general structure of the model:

data{
int<lower = 1> I; // number of items
int<lower = 1> J; // number of examinees
int<lower = 1> N; // number of observed data points
int<lower = 1> ii[N]; // item id
int<lower = 1> jj[N]; // person id
real logrt[N]; // log response times
real logrt1[N]; // log response times component 1
real logrt2[N]; // log response times component 2
}

parameters{
vector[I] lambda1;
vector[I] lambda2;
vector<lower=0>[I] phi1;
vector<lower=0>[I] phi2;
matrix[J,2] ksi;
vector<lower=0>[I] sigma1;
vector<lower=0>[I] sigma2;
vector<lower=0>[I] sigma_e;
}
model{

lambda1 ~ normal(2,5);
lambda2 ~ normal(2,5);

phi1 ~ normal(0,1);
phi2 ~ normal(0,1);

to_vector(ksi) ~ normal(0,1);

sigma1 ~ cauchy(0,0.3);
sigma2 ~ cauchy(0,0.3);
sigma_e ~ cauchy(0,1);

logrt1 ~ normal(lambda1[ii] - phi1[ii] .* ksi[jj,1], sigma1[ii]);
logrt2 ~ normal(lambda2[ii] - phi2[ii] .* ksi[jj,2], sigma2[ii]);

for(n in 1:N) logrt ~ normal(logrt1[n] + logrt2[n], sigma_e[ii[n]]);
}

It gets problematic once I specify both log response time components logrt1 and logrt2 as parameters. I have tried numerous versions of the model, each try either not converging or with very low n_eff, especially for both phi-vectors.

My best shot (as of yet) looks like this:

parameters{
matrix[J,2] ksi;
cholesky_factor_corr[2] L_Omega_ksi;

vector[I] mu_lambda;
vector[I] lambda1;
vector<lower=0>[I] phi1;
vector<lower=0>[I] phi2;
vector<lower=0>[I] sigma1;
vector<lower=0>[I] sigma2;
}

transformed parameters{
vector<lower=0>[2] tau_ksi;
vector[I] lambda2;

tau_ksi[1] = 1;
tau_ksi[2] = 1;

lambda2 = mu_lambda - lambda1;
}

model{
vector[N] lt1;
vector[N] lt2;
lt1 = lambda1[ii] - phi1[ii] .* ksi[jj,1];
lt2 = lambda2[ii] - phi2[ii] .* ksi[jj,2];

L_Omega_om ~ lkj_corr_cholesky(2);
for (j in 1:J) om[j] ~ multi_normal_cholesky(rep_vector(0,2), diag_pre_multiply(tau_om, L_Omega_om));

lambda1 ~ normal(2,1);
mu_lambda ~ normal(5,2);
phi1 ~ normal(0,2.5);
phi2 ~ normal(0,0.1);

sigma1 ~ normal(0,1);
sigma2 ~ normal(0,1);

logrt ~ normal( lt1 + lt2, sigma_e[ii]);
}

Non-centering does not help, making the model hierarchical doesn’t either. So I guess the model suffers from identification-issues, especially with regard to the phi parameters. Hence my question regarding possible constraints for identification.

Another thing I was thinking about… the response times are log-transformed, and summation does not normalize them. Would it be beneficial to not transform the response times, and work with alternative transformations/distributions. If so, are there such alternatives? I skimmed through the paper by Martings & Williams ( Outgrowing the Procrustean Bed of Normality: The Utility of Bayesian Modeling for Asymmetrical Data Analysis), but I am unsure if this might be a viable path. It is well possible that the whole undertaking is futile, because the model is not identifiable.

I hope this post adds some clarification to my issues. Any help appreciated!

Best wishes
Christoph

logrt ~ normal(logrt1[n] + logrt2[n], sigma_e[ii[n]])

That logrt1 and logrt2 look pretty non-identified (presumably there’ll be lots of correlations in the underlying parameters in the posterior and this will give the sampler trouble).

You’ll probably either have to tighten up the priors quite a bit (make stronger assumptions about what these parameters are), or change the model so you’re not estimating the two things separately.

In the second model it doesn’t look like ksi has a prior. Can you add one (it has one in the first)?

Dear All,
after quite some time I’d like to sort of reactivate this thread (I did not want to open a new thread, since the question is directly related to the specification of the model in question). We implemented a Gibbs-Sampler of the non-compensatory response time model that involves a convolution of two log-normal distributions. I’d like to transfer this model specification to Stan.

In an old thread in the Stan Google Group I found a function for convoluting two distributions using the ODE integrator. The function was proposed by Sebastian Weber (@wds15), and it’s exactly what I need. See the function below (adapted to two lognormal distributions):

functions {
  real[] kernel(real x, real[] y, real[] theta, real[] x_r, int[] x_i) {
    real dydx[1];
    real mu1 = theta[1];          //mean of lognormal 1
    real sigma1 = theta[2];     //sd of lognormal 1
    real mu2 = theta[3];          //mean of lognormal 2
    real sigma2 = theta[4];     //sd of lognormal 2

    //Convolve the two lognormal distributions
    dydx[1] = exp(lognormal_lpdf(x_r[1] - x | mu1, sigma1) + lognormal_lpdf(x | mu2, sigma2) );
    return dydx;
  }

  real convolve_kernel(real a, real b, real[] theta, real x) {
    int x_i[0];
    return(integrate_ode(kernel, rep_array(0., 1), a, rep_array(b, 1), theta, rep_array(x, 1), x_i)[1,1]);
  }
}

While I understand the principle of what is happening, it is unclear to me how the model would look like exactly, i.e. how to integrate this function into the model block. Of course, mu1, mu2, sigma1, and sigma2 would be the parameters of the model, and x_r are the data I have (the total response time). But what is x? So, would the model look something like this (with logrt being the data)?

model{
mu1 ~ normal(0,5);
mu2 ~ normal(0,5);
sigma1 ~ exponential(1);
sigma2 ~ exponential(1);
convolve_kernel(0, 10, {mu1, sigma1, mu2, sigma2}, logrt);
}

Any hints on how to integrate the convolution-function into the model block are highly appreciated, along with any tipps on how to make the convolution correct!

Thank you!

Best wishes
Christoph

Almost. I just read quickly over it, but the convolution defines the log-prob density here, right? Of so, the you need to write:

target += convolve_kernel(0, 10, {mu1, sigma1, mu2, sigma2}, logrt);

instead.

I proposed the above at a time when we did not have an integrate_1d function available in Stan. So with a recent Stan version you should be able to reformulate the above using this new function.

In case you stick with the ode stuff and use stan >= 2.23 (or something like that), then you should switch to use the new variadic ODE interface which will give you ~20% speed increase… but get your model working first, of course… then worry about speed.

Hi Sebastian,

Thank you very much for your help! The model compiles, but throws an error once I want to sample (I have attached the complete model to this post):

Chain 1: Rejecting initial value:
Chain 1: Error evaluating the log probability at the initial value.
Chain 1: Exception: Exception: Exception: lognormal_lpdf: Random variable is -8.88178e-16, but must be >= 0!  
  (in 'model30e463c64dd0_conv_ncrt' at line 13)
  (in 'model30e463c64dd0_conv_ncrt' at line 19)
  (in 'model30e463c64dd0_conv_ncrt' at line 63)

It informs me about a constraint violation happening in the first term of the convolution, lognormal_lpdf(x_r[1] - x | mu1, sigma1). I exported and tested the function to/in R and compared it with some other R-functions (the convolveR and ode functions in the script you posted in the Google Group), and it’s only the Stan-function throwing that error. Is it possible that Stan is too sensitive regarding the parameter constraints (since -8.88178e-16 is very very close to zero)?

Regarding the integrate_1d function; I would need to substitute only the integrate_ode_rk45 function with the integrate_1d function, correct?

Thanks again!

Best wishes
Christoph

conv_ncrt.stan (1.8 KB)

Hi Christoph -

can you post or write out the model definition here? You can write in latex with the double dollar sign $$ syntax or you can post a screenshot.

I’m asking because I follow your model until the final step, when you combine using a normal distribution.

Regarding what Ben said, you do need priors for all parameters in latent variable models. Stan usually won’t be happy if you don’t.

I’m certain you can do a compensatory IRT model in Stan, but of course it can get a tricky with the exact coding, which is why it would be helpful if you posted the full model definition.

1 Like

I would just add 1E-6 to avoid that things go close to 0 here (assuming that 1E-6 is a sufficiently small number in your problem so that it can be ignored.). Remember that the ODE integrator only controls errors up to the tolerances given to it (which is 1E-10 in your case as you use the defaults).

Hi Robert,

Thanks for your reply! Regarding the specification in one of the previous posts: ksi, L_Omega_ksi and tau_ksi should have actually been om, L_Omega_om, and tau_om, these were typos on my part. This model specification is quite outdated, the up-to-date one is included in the conv_ncrt.stan file. But you are correct, the critical part was the likelihood, which was specified incorrectly in the post from August.

The basic idea, however, is still the same: the time it takes a student j to answer an item i is the sum of two log-normally distributed response time components rt_1 and rt_2. Each component is generated by an one-dimensional log-normal response time model, rt_k = \alpha_{ki} - \phi_k * \omega_{ki} + \epsilon_{ki}, where \alpha is the time intensity, \phi is the speed intensity (constant across items), \omega_{kj} is the component-specific latent trait, and \epsilon_{ki} is a normally distributed residual variance. Both response time components are unobserved. What can be observed is the response time \mathbf{rt_i} on item i can beobserved, and \mathbf{rt_i} is the sum of the aforementioned response time components rt_1 and rt_2:

\mathbf{rt_i} = rt_1 + rt_2 = exp(\alpha_{1i} - \phi_1 * \omega_{1j} + \epsilon_{1i}) + exp(\alpha_{2i} - \phi_2 * \omega_{2j} + \epsilon_{2i})

This model is noncompensatory as low values of \omega_1 can not be compensated by high values of \omega_2. Moreover, the item-specific response time \mathbf{rt_i} is not log-normally distributed anymore, because the lognormal distribution is not closed under linear combinations. Thus, the conditional density function f(\mathbf{rt_i} | \omega_1, \omega_2) of the response time of item i is a convolution of two lognormal distributions with scale parameters m_k = \alpha_{ki} - \phi_k * \omega_{kj} and shape parameters \sigma_{ki}.

So, that’s essentially what I am trying to implement here (see the current model specification attached to this post).

conv_ncrt.stan (1.9 KB)

Unfortunately, increasing the tolerance to 1E-6 did not solve the problem. I am still getting the previous error message:

Chain 1: Rejecting initial value:
Chain 1: Error evaluating the log probability at the initial value.
Chain 1: Exception: Exception: Exception: lognormal_lpdf: Random variable is -8.88178e-16, but must be >= 0!  
  (in 'model30e463c64dd0_conv_ncrt' at line 13)
  (in 'model30e463c64dd0_conv_ncrt' at line 19)
  (in 'model30e463c64dd0_conv_ncrt' at line 63)

I don’t think there is a problem with the result of the integration, but rather with the input. I think that in lognormal_lpdf(x_r[1] - x | mu1, sigma1) the term x_r[1] - x causes the error, but I could not find the reason why. It is entirely possible that I have an error somewhere in the specification of the parameters, or that I misunderstood some aspect of the convolution, so any input/hints/suggestions are appreciated!

Best wishes
Christoph

Some parts of the model do look a bit scary, i.e.

exp( lognormal_lpdf( x_r[1] - x | mu1, sigma1) + lognormal_lpdf( x | mu2, sigma2) );

You’re exponentiating the joint density of two log-normals (multiplied on the log scale). That’s not going to be very easy for Stan to sample if there is a lot of correlation between the parameters, as there clearly is.

My advice would be to start simple and work up towards a full model. From what I understand you have a log-normally distributed latent variable and want to do a non-compensatory model. It seems like the most basic way to do this is with a mixture, i.e. for d \in \{1,2\} dimensions. Say we estimate the mixing proportions as \theta = Pr(d=1), then we have:

\theta LogNormal(\alpha_i - \beta_{jd=1}) + (1 - \theta)LogNormal(\alpha_i - \beta_{jd=2})

This is a standard density in Stan and can be coded using the manual’s mixture model code. I might start with this and then try to add in the more complicated features. Stan can code anything in theory but the target geometry can also be difficult to assess when starting with a more complicated model, particularly if that model was estimated using something else. HMC struggles with very correlated parameter spaces.

On the other hand, if you want to do a convolution, maybe start with the standard convolution model and slowly add in label variable features. That also might pin point where the target geometry is failing.

Sorry that I can’t be more helpful, but in general I’ve found this to be the only way to do a new model in Stan that explores a difficult target geometry. Otherwise there’s too many variables to identify which problems are the constraints.

Hi Robert, Hi all:

Thank you for our input, highly appreciated. We briefly considered a mixture model, but concluded that this kind of model did not adequately reflect our goals. I’d definitely want to pinpoint where the target geometry is failing; what do you mean with ‘standard convolution model’? I am aware that this model is not the easiest sampling-wise. The Gibbs-Sampler also suffers heavily from high autocorrelation, but I want to implement the general structure of the model to see if there are any possibilities to improve it, or make it more feasible. I am also aware that I will very likely have to deal with a number of identification issues.

As the most basic step, I need to get the convolution to work properly. There are a few curious things, however, that indicate that this is not the case at the moment. Both integrate_ode_rk45 and integrate_1d convolutions work without any problems when I expose them to R via expose_stan_functions(). When I compare them to built-in R functions and the convolution procedure implemented in the distr-Package they yield the same results (I also checked the functions with the whole range of response times, the results are fine):

### data

theta = c(1.2, 0.25, 1.5, 0.3)
rt = c(9.12, 10.25, 7.65, 6.30, 12.55)

### functions

# built-in R

# a)

conv_R <- function(rt) { integrate(function(x) dlnorm(rt - x, theta[1], theta[2]) * dlnorm(x, theta[3], theta[4]), 0, rt)$value }
conv_R <- Vectorize(conv_R)

# b)

f.X <- function(x) dlnorm(x,theta[1],theta[2])
f.Y <- function(y) dlnorm(y,theta[3],theta[4])    
f.Z <- function(rt) integrate(function(x,z) f.Y(z-x)*f.X(x),0,rt,rt)$value
f.Z <- Vectorize(f.Z)

# c)

library(distr)
N <- Lnorm(mean=theta[1],sd=theta[2])         
L <- Lnorm(mean=theta[3],sd=theta[4]) 
conv <- convpow(N+L, 1)             
f.U  <- d(conv)                    


# Stan (integrate_1d)

conv1_stan <- '
  functions {
    real kernel(real x, real xc, real[] theta, data real[] x_r, data int[] x_i) {
      real out;
      real mu1    = theta[1];     // mean of lognormal1
      real sigma1 = theta[2];     // sd of lognormal1
      real mu2    = theta[3];     // mean of lognormal2
      real sigma2 = theta[4];     // sd of lognormal2
      
      // convolution of lognormal distributions
      out = exp(lognormal_lpdf( x | mu1, sigma1) + lognormal_lpdf( x_r[1] - x | mu2, sigma2));
      return out;
    }
    
    real convolve1_kernel(real a, real b, real[] theta, real x) {
     int x_i[0];
     return(integrate_1d(kernel, a, b, theta, rep_array(x,1), x_i, 1E-6));
    }
}
' 

expose_stan_functions(stanc(model_code = conv1_stan))

# Stan (integrate_ode_rk45)

conv2_stan <- '
  functions {
    real[] kernel(real x, real[] y, real[] theta, data real[] x_r, data int[] x_i) {
      real dydx[1];
      real mu1    = theta[1];     // mean of (log)normal1
      real sigma1 = theta[2];     // sd of lognormal1
      real mu2    = theta[3];     // mean of lognormal2
      real sigma2 = theta[4];     // sd of lognormal2
      
      // Convolution of lognormal distriutions
      dydx[1] = exp(lognormal_lpdf( x | mu1, sigma1) + lognormal_lpdf( x_r[1] - x | mu2, sigma2));
      return dydx;
    }
    
    real convolve2_kernel(real a, real b, real[] theta, real x) {
     int x_i[0];
     return(integrate_ode_rk45(kernel, rep_array(0.,1), a, rep_array(b,1), theta, rep_array(x,1), x_i)[1,1]);
    }
}
'

expose_stan_functions(stanc(model_code = conv2_stan))



### comparison

check1 = sapply(rt, function(x) conv_R(x))
check2 = sapply(rt, function(x) f.Z(x))
check3 = sapply(rt, function(x) f.U(x))
check4 = sapply(rt, function(x) convolve1_kernel(0, x, theta, x))
check5 = sapply(rt, function(x) convolve2_kernel(0, x, theta, x))
> all.equal(check1, check2, check4, check5)
[1] TRUE

Once I want to compile the model, however, I receive different warning and/or error messages. The first one relates to a syntax error that I don’t understand, because everything should be declared correctly:

Ill-typed arguments supplied to function 'integrate_1d'. Available signatures: 
((real, real, real[], data real[], data int[]) => real, real, real, real[], data real[], data int[]) => real
((real, real, real[], data real[], data int[]) => real, real, real, real[], data real[], data int[], data real) => real
Instead supplied arguments of incompatible type: (real, real, real[], data real[], data int[]) => real, real, real, real[], real[], int[], real.

and

Ill-typed arguments supplied to function 'integrate_ode_rk45'. Available signatures: 
((real, real[], real[], data real[], data int[]) => real[], real[], real, real[], real[], data real[], data int[]) => real[,]
((real, real[], real[], data real[], data int[]) => real[], real[], real, real[], real[], data real[], data int[], data real, data real, data real) => real[,]
Instead supplied arguments of incompatible type: (real, real[], real[], data real[], data int[]) => real[], real[], real, real[], real[], real[], int[].

Both models nevertheless compile in Rstan. In cmdstan, however, they do not compile because of this syntax error.

Once I want to start sampling, however, the next errors appear. In case of the model with the integrate_1d function, the errors says

SAMPLING FOR MODEL 'conv2_ncrt' NOW (CHAIN 1).
Chain 1: Exception: Exception: integrate: error estimate of integral 2.95401e-06 exceeds the given relative tolerance times norm of integral  
(in 'model45c35534abc_conv2_ncrt' at line 23)
(in 'model45c35534abc_conv2_ncrt' at line 67) 

In case of the model with the integrate_ode_rk45 function, the error says

Chain 1: Rejecting initial value:
Chain 1: Error evaluating the log probability at the initial value.
Chain 1: Exception: Exception: Exception: lognormal_lpdf: Random variable is -8.88178e-16, but must be >= 0!  
  (in 'model30e463c64dd0_conv_ncrt' at line 13)
  (in 'model30e463c64dd0_conv_ncrt' at line 19)
  (in 'model30e463c64dd0_conv_ncrt' at line 63) 

If I remember correctly, @bbbales2 and @nhuurre mentioned in other threads back in November 2020 that there is some kind of bug. Is this still the case? I cannot, for the love of me, understand why these errors occur.

The ‘funny’ thing is that both models compile and sample one I set init = 0. Sampling time, however, explodes, and I’d like to set up the convolution model without fumbling with the initial values. I also used print within the functions-block to check the results of the integration, and as far as I can tell they are largely okay.

Currently, I am using R4.0.3 and Rstan 2.21.2. Please find attached both models in their current specification:
conv_int1d.stan (1.9 KB) conv_intrk45.stan (2.0 KB)

As I said, I am aware that this is not the easiest model to work with. Nevertheless, I want to make sure that the convoultion part is set up properly. I will worry about the other parts of the model later on (especially with regard to identifiability). So, any help is appreciated! If there is anything unclear, or more information is required, please let me know.

Best wishes
Christoph

A quick update: the syntax error disappears once I declare data real x in the integrate_ode_rk45 and integrate_1d functions. The other issues remain.

The problem is a missing data qualifier in the functions signature. It should be

real convolve1_kernel(real a, real b, real[] theta, data real x) {

because x is used to construct argument to a function that requires data.

Another problem I see is that convolve1_kernel returns a probability density but target += expects a log-probability density. So that target increment should be

for(n in 1:N)
  target += log(convolve1_kernel(0., rt[n], {mu1[n], sigma1[ii[n]], mu2[n], sigma2[ii[n]]}, rt[n]));

However, I don’t think you even need a convolution for this model. The latent reaction time can be modeled explicitly.

data{
  int<lower = 1> I;                      // number of items
  int<lower = 1> J;                      // number of examinees
  int<lower = 1> N;                      // number of observed data points
  int<lower = 1> ii[N];                  // item id
  int<lower = 1> jj[N];                  // person id
  vector[N] rt;                            // response times
}

parameters{
  vector<lower=0>[I] alpha1;
  vector<lower=0>[I] alpha2;
  positive_ordered[2] phi;
  matrix[J,2] om;
  real<lower=0> tau2;
  cholesky_factor_corr[2] L_Omega_om;
  vector<lower=0>[I] sigma1;
  vector<lower=0>[I] sigma2;
  vector<lower=0,upper=1>[N] frac_rt1;
}

transformed parameters{
  vector[N] rt1 = rt .* frac_rt1;
  vector[N] rt2 = rt  - rt1;
  vector<lower=0>[2] tau_om;
  vector[N] mu1;
  vector[N] mu2;
   tau_om[1] = 1; tau_om[2] = tau2;
   mu1 = alpha1[ii] - phi[2] * om[jj,1];
   mu2 = alpha2[ii] - phi[1] * om[jj,2];
}

model{

  for (j in 1:J) om[j] ~ multi_normal_cholesky(rep_vector(0,2), diag_pre_multiply(tau_om, L_Omega_om));
 
  alpha1 ~ normal(2, 1);
  alpha2 ~ normal(2, 1);
 
  phi ~ normal(0, 1);
  
  tau2 ~ normal(0, 1);

  sigma1 ~ normal(0, 1);
  sigma2 ~ normal(0, 1);

  for (n in 1:N) {
    rt1[n] ~ lognormal(mu1[n], sigma1[ii[n]]);
    rt2[n] ~ lognormal(mu2[n], sigma2[ii[n]]);
  }

} 

(The transform form frac_rt1 to (rt1, rt2) is linear, so no Jacobian adjustment.)

2 Likes

Hi Niko:

Thank you very much! Indeed, the syntax error disappears once I declare x to be data (data real x) in the function signature.

Being able to model the reaction times without the convolution would be perfect. I will give your suggestion a try right away!

Thanks again, best wishes
Christoph

Dear Niko, Dear All:

Thanks again for the hint regarding the explicit modeling of the latent reaction times. I was not aware that it is possible to assign prior distributions to transformed parameters! The model compiles and starts without problems now (and delivers some results in a reasonable amount of time).

Sampling and effective sample size are still atrocious though. I also have sporadic divergent transitions and transitions exceeding max_treedepth; they disappear by increasing both adapt_delta and max_treedepth. Even with chain lengths of 3,000 the effective sample size of almost all parameters is seldom larger than 100. Most of the time n_eff < 10, so I guess the sampler is stuck right at the beginning. This confirms @bbbales2 who already mentioned that ‘there’ll be lots of correlations in the underlying parameters in the posterior and this will give the sampler trouble’. Moreover, I always get the There were x chains where the estimated Bayesian Fraction of Missing Information was low. warning indicating that there was something wrong during the adaption phase, thus the chains could not explore the posterior distribution efficiently (which is obviously true, given the very small n_eff). In this regard, non-centering does not help (even though n_eff increases somewhat, but from < 10 to < 100).

The current model looks as follows (thanks again to @nhuurre):

data{
  int<lower = 1> I;                      // number of items
  int<lower = 1> J;                      // number of examinees
  int<lower = 1> N;                      // number of observed data points
  int<lower = 1> ii[N];                  // item id
  int<lower = 1> jj[N];                  // person id
  vector[N] rt;                              // response times
}

parameters{
  vector[I] alpha1;
  vector[I] alpha2;
  positive_ordered[2] phi;
  matrix[J,2] om;
  cholesky_factor_corr[2] L_Omega_om;
  vector<lower=0>[I] sigma1;
  vector<lower=0>[I] sigma2;
  vector<lower=0,upper=1>[N] frac_rt1;
}

transformed parameters{
  vector<lower=0>[N] rt1 = rt .* frac_rt1;
  vector<lower=0>[N] rt2 = rt  - rt1;
  vector[N] mu1; 
  vector[N] mu2; 
   mu1 = alpha1[ii] - phi[2] * om[jj,1];
   mu2 = alpha2[ii] - phi[1] * om[jj,2]; 
}

model{

  for (j in 1:J) om[j] ~ multi_normal_cholesky(rep_vector(0,2), diag_pre_multiply(rep_vector(1,2), L_Omega_om));
  L_Omega_om ~ lkj_corr_cholesky(2);
 
  alpha1 ~ normal(2, 1);
  alpha2 ~ normal(2, 1);
  
  phi ~ normal(0, 1);

  sigma1 ~ normal(0, 1);
  sigma2 ~ normal(0, 1);
  
  rt1 ~ lognormal(mu1, sigma1[ii]);
  rt2 ~ lognormal(mu2, sigma2[ii]);
}

A comment regarding the line for (j in 1:J) om[j] ~ multi_normal_cholesky(rep_vector(0,2), diag_pre_multiply(rep_vector(1,2), L_Omega_om)): the basic response time model rt = \alpha_{i} - \phi * \omega_{j} + \epsilon_{i} is identified by setting the mean and variance of the speed parameters \omega_j to 0 and 1, respectively. The only other necessary constraints are \phi > 0 and \epsilon_i > 0. Since in this model there are two response time components, I set both means and variances to 0 and 1, respectively. Moreover, I set positive_ordered[2] phi; to identify both components; it is entirely possible that this is not enough.

As already mentioned, the core problem is the highly correlated parameter space. The resulting parameter estimates, nevertheless, are at least in the correct range. Unfortunately, my mathematical understanding of potential remedies (if there are any!) for this kind of problem is quite limited. What are the general strategies in this case, and are there any specifically related to this model? Even if there are none, and the model is just not feasible, I need closure, because I have been banging my head against the wall because of this model for quite some time now…

If anyone needs further information, please let me know!

Thanks, best wishes
Christoph

P.S. I also tried to simplify the model a bit by making both \omega_1 and \omega_2 independent, but that did not change anything. Moreover, I tried to tighten the priors for the alpha and sigma parameters, but to no avail. If anybody is interested, here is the way we generated the data: data_generation.R (1.6 KB)

Switch to simulated data and try to work from there. If you know the solution then maybe it’ll be easier to see the problem or need to add an assumption or something.

Other things, can you write out the intention of the priors here (in words or Math):

mu1 = alpha1[ii] - phi[2] * om[jj,1];
mu2 = alpha2[ii] - phi[1] * om[jj,2];

...

for (j in 1:J)
  om[j] ~ multi_normal_cholesky(rep_vector(0,2), diag_pre_multiply(rep_vector(1,2), L_Omega_om));

It looks like there’s some sort of per-examinee correlation term. I am wondering if there is a problem here but maybe there are other reasons for the way things are.

Another thing, is there an expected non-identifiability here in the design? Is that what non-compensatory means? Like, if I repeatedly have one person take the same two tests over and over and only measure the total time, I don’t get to know how long either test takes. But if there are two versions of each test, I think I can. Is this the first or the second situation, and do you have reason to believe your data informs everything? Or could there be a real non-identifiability?

Dear Ben:

Thank you very much for your response! A short disclaimer: the basic idea is not mine, so I do not take any credit!

Let’s say you have a test consisting of i = 20 items measuring mathematical competence. The response times rt_k to these items depend first of all on the mathematical ability \omega_{math} of individual j. So, you have a first (latent) response time component rt_{math_i} = \alpha_{i} - \phi_i * \omega_{math_j} + \epsilon_{i}, which is usually modelled with a lognormal distribution with mean \alpha_{i} - \phi_i * \omega_{math_i} and scale \epsilon_{i}. The basic assumption is that with a decreasing ability in mathematics \omega_{math}, the item-specific response time rt_{math_i} increases.

But, because the items of the test consist of word problems, the individual has at least to be able to read the problem in order to solve the item successfully. So you have a second (latent) response time component rt_{read_i} that is added to the first. Non-compensatory in this context refers to the assumption that a high ability in math does not compensate for a lack of reading ability: if an individual j has a hard time reading the problem, the response time increases irrespective of his/her ability in math. The overarching response time rt_i is the sum of both (latent) response time components, rt_{math_i} + rt_{read_i}, and is smallest when an individual j has considerable mathematics and reading skills. So you have to ability parameters (in the context of response time models called speed parameters) \omega_1 and \omega_2, which may be correlated. That’s the reasoning behind the multivariate_normal_cholesky prior distribution in the model. The uni-dimensional RT model is usually identified by setting the mean and variance of the speed parameters \omega1_j and \omega2_j to 0 and 1, respectively. As I mentioned in my previous post, I also modeled both parameters independently, but that did not change anything regarding the sampling efficiency.

Regarding the identifiability issue. I’m not sure I understood correctly, so please bear with me! I think it would be easier if it was possible to say that one latent response time component is always larger than the other. This is to assume that one ability primarily determines the response time to an item i, and that this latent response time component is always larger than the second one. In the aformentioned example, this would mean that it should always take longer to solve the mathematical problem
than to read and understand the problem (because mathematical ability is the primary focus, and reading ability is a secondary, but nevertheless important, competence), i.e. rt_{i} = rt_{math_i} + rt_{read_i}, with rt_{math_i} > rt_{read_i} . But I do not know how to model this correctly. Currently, the model does not make any assumption about the ordering of the response time components, so there may be a real non-identifiability.

I hope this helps understanding the problem. If anything remains unclear, please let me know!

Best wishes
Christoph