Hierarchical graded IRT

saschagobel · July 22, 2021, 4:02pm

Hi all,

Is there a way to fit a hierarchical graded IRT model with brms where both the mean and the variance of the ability parameter may depend on covariates? The goal here is to get subgroup-specific (defined by covariates) fitted values of the ability parameter, not predicted probabilites at specific item categories.

This was implemented using the EM algorithm here https://github.com/xiangzhou09/hIRT. However, for various reasons I’d rather rely on a Bayesian approach, preferably via brms but directly in Stan would also be fine.

I tried 1pl and 2pl brms models of the following form.

1pl_formula <- bf(response ~ 1 + X + (1 | item) + (1 | id))
fit_1pl <- brm(
  formula = 1pl_formula,
  data = data,
  family = brmsfamily("cumulative", "logit"),
  iter = 1000,
  cores = 4
)

2pl_formula <- bf(response ~ 1 + X + (1 | item) + (1 | id),
              disc ~ 1 + (1 | item))
fit_2pl <- brm(
  formula = 2pl_formula,
  data = data,
  family = brmsfamily("cumulative", "logit"),
  iter = 1000,
  cores = 4
)

Here the model is still not heteroscedastic but I’m anyway more interested in the mean regression of the ability parameter. My initial thought was that predictions could be obtained by building the linear predictor using the estimated ability parameters and population-level covariates (without the difficulty and discrimination parameters or category-specific intercepts, since the focus is on the mean structure of the ability parameter). This works fine and when replicating a model with year spline terms from the paper corresponding to the EM approach, the group-specific trends in the ability parameters are pretty close.

However the baseline category of the covariates, in this case the earliest year, will naturally always be at zero of the ability scale. In the EM implementation above, the output is the mean regression, i.e., the linear predictor for the ability parameter including one global intercept, which represents the baseline category, so this shifts accordingly but thats not the case with the standard IRT.

What am I missing, is this an identification issue or is this a special case that must be implemented directly in Stan?

martinmodrak · August 22, 2021, 9:14pm

Hi,
sorry for not getting to you earlier - did you manage to resolve the issue in the meantime?

Martin

saschagobel · September 11, 2021, 9:55am

Yes, I wrote the model in Stan and managed to replicate the results from the paper mentioned above. I don’t think that it is currently possible to build this type of model with brms.

However, it takes quite long to run so I tried to implement the reduce_sum parallelization, but I can’t get the model to compile.

Here is the model without reduce_sum, data is attached. I used the model posted by @danielcfurr here as basis and the sum-to-zero constraint mentioned here. Running this on Windows 10 (x64, i7-10510, 16GB RAM) took roughly 40 minutes. Even though there are plenty of parameters to be estimated here, this seems somewhat long. I guess there is some potential for making this run faster by rewriting the model. I’d appreciate any suggestions here!

stan_data.Rdata (6.3 KB)

library(cmdstanr)

load("./stan_data.Rdata")

gIRT <- "// GRADED RESPONSE MODEL

data {
  int<lower=1> I;                // Number of items
  int<lower=1> J;                // Number of respondents
  int<lower=1> N;                // Total responses
  int<lower=1,upper=I> ii[N];    // Item ID
  int<lower=1,upper=J> jj[N];    // Person ID
  int<lower=0> y[N];             // Responses
  int<lower=1> K;                // Number of covariates for location (mu) of ability (theta)
  matrix[J,K] X;                 // Covariate matrix (including intercept) for location (mu) of ability (theta)
}

transformed data {
  int r[N];                      // modified response; r = 1, 2, ... m_i + 1
  int m[I];                      // number of difficulty parameters per item
  int pos_alpha[I];              // first position in difficulty for item
  int pos_delta[I];              // first position in between-step change for item 

  // Allowing for differing numbers of response categories
  m = rep_array(0, I);															
  for(n in 1:N) {
    r[n] = y[n] + 1;
    if(y[n] > m[ii[n]]) m[ii[n]] = y[n];
  }
  pos_alpha[1] = 1;
  pos_delta[1] = 1;
  for(i in 2:(I)) {
    pos_alpha[i] = pos_alpha[i-1] + m[i-1];
    pos_delta[i] = pos_delta[i-1] + m[i-1] - 1;
  }
}

parameters {
  vector[I] eta;                      // First diffulty parameter for each item
  vector<lower=0>[sum(m)-I] delta;    // Between-step changes in difficulty
  real<lower = 0> beta[I];            // Constrain all discrimination parameters to be positive
  vector[J] theta;                    // Ability parameter
  vector[J-1] mu_free;                // Sum-to-zero constraint for location (mu) of ability (theta)
  vector[K] gamma;                    // Covariate coefficients
}

transformed parameters {
  vector[J] mu;                    // location of ability (theta)
  vector[sum(m)] alpha;            // Composite item difficulties
  
  for(j in 1:(J-1)) mu[j] = mu_free[j]; // Sum-to-zero constraint for location (mu) of ability (theta)
  mu[J] = -sum(mu_free);              
  
  for(i in 1:I) {
    if (m[i] > 0)
      alpha[pos_alpha[i]] = eta[i];
    for (k in 2:m[i])
      alpha[pos_alpha[i]+k-1] = alpha[pos_alpha[i]+k-2] + delta[pos_delta[i]+k-2];
  }
}

model{
  eta ~ normal(0,1);
  delta ~ normal(0,1);
  beta ~ lognormal(1,1);
  gamma ~ student_t(3, 0, 1);
  target += normal_lpdf(alpha | 0, 1);
  target += normal_lpdf(mu | X*gamma, 1); // Mean regression
  theta ~ normal(mu, 1);

  for (n in 1:N) {
    r[n] ~ ordered_logistic(theta[jj[n]] * beta[ii[n]],
                            segment(alpha, pos_alpha[ii[n]], m[ii[n]]));
  }
}
"
write(gIRT, "gIRT.stan")

compiled_gIRT <- cmdstan_model(stan_file = "gIRT.stan")

fit_gIRT <- compiled_gIRT$sample(
  data = stan_data,
  chains = 4,
  parallel_chains = 4,
  iter_warmup = 500,
  iter_sampling = 100,
  refresh = 50
)

And here is my attempt at implementing reduce_sum:

gIRTrs <- "// GRADED RESPONSE MODEL

functions {
  real partial_sum_lpmf(int[] slice_r,
                        int start, int end,
                        int[] ii,
                        int[] jj,
                        vector theta,
                        real beta,
                        vector alpha,
                        int[] m,
                        int[] pos_alpha) {
    return ordered_logistic_lupmf(slice_r | 
                                  theta[jj[start:end]] * beta[ii[start:end]], 
                                  segment(alpha, pos_alpha[ii[start:end]], 
                                                  m[ii[start:end]]));
  }
}

data {
  int<lower=1> I;                // Number of items
  int<lower=1> J;                // Number of respondents
  int<lower=1> N;                // Total responses
  int<lower=1,upper=I> ii[N];    // Item ID
  int<lower=1,upper=J> jj[N];    // Person ID
  int<lower=0> y[N];             // Responses
  int<lower=1> K;                // Number of covariates for location (mu) of ability (theta)
  matrix[J,K] X;                 // Covariate matrix (including intercept) for location (mu) of ability (theta)
  int<lower=1> grainsize;
}

transformed data {
  int r[N];                      // modified response; r = 1, 2, ... m_i + 1
  int m[I];                      // number of difficulty parameters per item
  int pos_alpha[I];              // first position in difficulty for item
  int pos_delta[I];              // first position in between-step change for item 

  // Allowing for differing numbers of response categories
  m = rep_array(0, I);															
  for(n in 1:N) {
    r[n] = y[n] + 1;
    if(y[n] > m[ii[n]]) m[ii[n]] = y[n];
  }
  pos_alpha[1] = 1;
  pos_delta[1] = 1;
  for(i in 2:(I)) {
    pos_alpha[i] = pos_alpha[i-1] + m[i-1];
    pos_delta[i] = pos_delta[i-1] + m[i-1] - 1;
  }
}

parameters {
  vector[I] eta;                      // First diffulty parameter for each item
  vector<lower=0>[sum(m)-I] delta;    // Between-step changes in difficulty
  real<lower = 0> beta[I];            // Constrain all discrimination parameters to be positive
  vector[J] theta;                    // Ability parameter
  vector[J-1] mu_free;                // Sum-to-zero constraint for location (mu) of ability (theta)
  vector[K] gamma;                    // Covariate coefficients
}

transformed parameters {
  vector[J] mu;                    // location of ability (theta)
  vector[sum(m)] alpha;            // Composite item difficulties
  
  for(j in 1:(J-1)) mu[j] = mu_free[j]; // Sum-to-zero constraint for location (mu) of ability (theta)
  mu[J] = -sum(mu_free);              
  
  for(i in 1:I) {
    if (m[i] > 0)
      alpha[pos_alpha[i]] = eta[i];
    for (k in 2:m[i])
      alpha[pos_alpha[i]+k-1] = alpha[pos_alpha[i]+k-2] + delta[pos_delta[i]+k-2];
  }
}

model{
  eta ~ normal(0,1);
  delta ~ normal(0,1);
  beta ~ lognormal(1,1);
  gamma ~ student_t(3, 0, 1);
  target += normal_lpdf(alpha | 0, 1);
  target += normal_lpdf(mu | X*gamma, 1); // Mean regression
  theta ~ normal(mu, 1);
  
  target += reduce_sum(partial_sum_lupmf, r, grainsize, 
                       ii, jj, theta, beta, alpha, m, pos_alpha);
}
"

write(gIRTrs, "gIRTrs.stan")

compiled_gIRTrs <- cmdstan_model("gIRTrs.stan", cpp_options = list(stan_threads = TRUE))

This throws the following error, which I can not resolve:

Semantic error
line 14, column 57 to column 76:
   -------------------------------------------------
    12:                          int[] pos_alpha) {
    13:      return ordered_logistic_lupmf(slice_r | 
    14:                                    theta[jj[start:end]] * beta[ii[start:end]], 
                                                                  ^
    15:                                    segment(alpha, pos_alpha[ii[start:end]], 
    16:                                                    m[ii[start:end]]));
   -------------------------------------------------

Too many indexes, expression dimensions=0, indexes found=1

Thanks!

martinmodrak · September 11, 2021, 4:10pm

I didn’t try running the model, but some general ideas:

This looks very suspicious - ignoring the sum-to zero constraint on mu it should be completely equivalent to something like

target += normal_lpdf(theta | X*gamma, sqrt(2));

i.e. mu probably can be integrated out and having it in the model could lead to problematic posterior geometry (I would expect to see strong correlations between corresponding elements of mu and theta). Couldn’t you transform the sum-to-zero constraint on mu into a constraint on theta and don’t treat mu as a parameter at all?

This form of sum-to-zero is potentially problematic. There has been quite an extensive discussion on this - probably more than you need to know: Test: Soft vs Hard sum-to-zero constrain + choosing the right prior for soft constrain

TLDR: try using sum(x) ~ normal(0, <small_number> * <num_lements>) or the code in this answer in the aformentioned thread + the one below (which is in my experience slightly superior). For more details read the whole topic.

It appears that the problem is that beta (as a parameter to partial_sum_lpmf) is defined as real, but you are indexing it as if it was array/vector. To define it as array, you would have real[] beta in the function signature.

Best of luck with your model!

saschagobel · September 11, 2021, 6:51pm

Thanks for the comments!

I tried running it with:

theta ~ normal(X*gamma, sqrt(2));
sum(theta) ~ normal(0, 0.0001);

Or

target += normal_lpdf(theta | X*gamma, sqrt(2));
sum(theta) ~ normal(0, 0.0001);

But it takes more than twice as long to run (I stopped it at some point).

I also tried real[] beta in the function signature. Now I receive the following error message:

Semantic error
line 14, column 34 to column 76:
   -------------------------------------------------
    12:                          int[] pos_alpha) {
    13:      return ordered_logistic_lupmf(slice_r | 
    14:                                    theta[jj[start:end]] * beta[ii[start:end]], 
                                           ^
    15:                                    segment(alpha, pos_alpha[ii[start:end]], 
    16:                                                    m[ii[start:end]]));
   -------------------------------------------------

Ill-typed arguments supplied to infix operator *. Available signatures: 
(int, int) => int
(real, real) => real
(row_vector, vector) => real
(real, vector) => vector
(vector, real) => vector
(matrix, vector) => vector
(real, row_vector) => row_vector
(row_vector, real) => row_vector
(row_vector, matrix) => row_vector
(real, matrix) => matrix
(vector, row_vector) => matrix
(matrix, real) => matrix
(matrix, matrix) => matrix
Instead supplied arguments of incompatible type: vector, array[] real

Are the available signatures for ordered_logistic_lupmf() different from ordered_logistic()? That beta is an array does not seem to be a problem with ordered_logistic().

Should I open a new topic for the reduce_sum issue?

saschagobel · September 11, 2021, 7:43pm

Just tried:

theta ~ normal(X*gamma,1);
mean(theta) ~ normal(0, 0.001);

Yields the same results as the initial sum-to-zero constraint but faster (24 minutes rather than 40) and fit_gIRT$cmdstan_diagnose() indicates no problems, the initial version had large R-hat values.

Or is this form of constraint also potentially problematic?

Still stuck on the reduce_sum issue, though.

martinmodrak · September 14, 2021, 1:12pm

The problem is not ordered_logistic_xxx but rather that you are trying to multiple a real array with a vector. This is not allowed, because this can correspond to several different operations, so you need to be explicit. If you want element-wise multiplication, you need to use the .* operator. If you want the inner/outer product, you need to have a vector and row_vector as operators (you can either change the type in the signature or use to_vector / to_row_vector).

saschagobel:

theta ~ normal(X*gamma,1);
mean(theta) ~ normal(0, 0.001);
Yields the same results as the initial sum-to-zero constraint but faster (24 minutes rather than 40) and fit_gIRT$cmdstan_diagnose() indicates no problems, the initial version had large R-hat values.

Or is this form of constraint also potentially problematic?

There are a couple things to learn from the fact that sum(theta) ~ normal(0, 0.0001); poses problems while mean(theta) ~ normal(0, 0.001); does not. Generally this form of soft sum-to-zero constraint can induce problematic posterior geometry as you have one more parameter than strictly needed and your posterior concentrates around a hyperplane in the full space. Such problems get more pronoucned the tighter the constraint is. The second constraint is much more permissive and hence probably avoids some of the pathologies, but it is also more likely the results actually violate the constraint substantially (easy to check, just look at the value of sum(theta) for each posterior sample). If you can live with this imprecision, then definitely no need to change your model.

If however the constraint is too permissible, then the best way would likely be to parametrize the model in a way that sum(theta) == 0, or at least sum(X * gamma) == 0 holds by cosntruction.

An outline of how one could construct such gamma and theta. Notably, sum(X * gamma) == 0 defines a K - 1 dimensional space (the kernel of sum(X * gamma), treated as a map from R^K to R), so for any given X you should be able to find a matrix B representing a suitable basis such that you can have:

transformed data {
   matrix[K, K - 1] B = //some linear algebra I am too lazy to figure out
}

parameters {
  vector[K - 1] gamma_raw;
  ...
}

transformed parameters {
  vector[K] gamma = B * gamma_raw;
}

and have sum(X * gamma) == 0 for any value of gamma_raw. Now we can have theta = X * gamma + theta_z where sum(theta_z) == 0 and you could likely gain some efficiency by constraining theta_z with the usually more efficient sum-to-zero constraint described at Test: Soft vs Hard sum-to-zero constrain + choosing the right prior for soft constrain - #31 by andre.pfeuffer .

Hope that clarifies more then confuses

saschagobel · September 24, 2021, 2:13pm

Hi,

I have a follow-up question on this model. At the moment the scale of theta is pinned to 1 to deal with non-identifiability. This works well, even with the soft sum-to-zero constraint, and reduce_sum helps a lot to speed this up, although I’m struggling a bit with optimizing this further via vectorization (see Using `reduce_sum` with `ordered_logistic` - #13 by saschagobel).

As a next step, I’d like to model the variance as well, so that theta is modeled as:

theta-model

The following won’t work, because scale invariance is not addressed and the model is not identified (the covariate matrix is the same for the location and scale):

theta ~ normal(X*gamma, exp(X*lambda));

In the paper mentioned above, the author suggests an additional sum-to-zero constraint on sigma, so that:

and “the geometric mean of the prior variances of the latent preferences equals one”.

I’m not sure how to implement this in Stan, though.

I tried the following but this yields a few divergent transitions, split R-hat greater than 1.05 on many parameters, and the results are clearly off.

// Just the additions

parameters {
  ...
  vector[K] lambda;
  vector<lower=0>[J] sigma;
}

model{
  ...
  log(sigma) ~ normal(X*lambda, 1); 
  mean(sigma) ~ normal(0, 0.001 * J);
  theta ~ normal(X*gamma, sigma);
  mean(theta) ~ normal(0, 0.001 * J);
  ...
}

Thanks for your help!

martinmodrak · September 24, 2021, 4:17pm

Unfortunately, this could be many things. As I mentioned above, the soft sum-to-zero constraint can cause problems in larger models and the constraint implementation by andre is usually superior. But maybe the problem is elsewher - the model is getting very large and it is hard to reason about without really digging deep (and having the data etc.).

What I would do would be to try various ways to simplify the model or prototype the individual parts of the model separately (using simulated data). Debuggin large models is just very very hard. An example how this might look for a much simpler model is at Small model implementation workflow • SBC (this is using simulation-based calibration via the SBC package, but a similar workflow is IMHO good practice even when one just runs a couple fits to simulated data by hand).

You may also try treating some groups of parameters as known (because you simulate the data and know true values) and pass those in the data block. If just one element of the model is causing trouble, you’ll get a well-behaved model when you fix the relevant parameter/group of parameters.

Best of luck with your model!

Topic		Replies	Views
Decomposing Regression Coefficient in Non-Linear Model Modeling techniques , specification , irt , brms	5	559	June 20, 2023
Hierarchical Multinomial Model taking days to run brms performance , multinomial-response , hierarchical-model	14	1602	April 9, 2020
2PL IRT Efficiency Questions brms fitting-issues , specification , performance	11	1012	May 26, 2020
Proper use of sum_to_zero_vector in nested multilevel models Modeling fitting-issues , specification , matrix , cmdstanr , hierarchical-model	17	346	June 2, 2025
Non convergence issue on polytomous IRT model Modeling	15	1941	June 21, 2023

Hierarchical graded IRT

Related topics