Hi all,

Is there a way to fit a hierarchical graded IRT model with brms where both the mean and the variance of the ability parameter may depend on covariates? The goal here is to get subgroup-specific (defined by covariates) fitted values of the ability parameter, not predicted probabilites at specific item categories.

This was implemented using the EM algorithm here https://github.com/xiangzhou09/hIRT. However, for various reasons I’d rather rely on a Bayesian approach, preferably via brms but directly in Stan would also be fine.

I tried 1pl and 2pl brms models of the following form.

``````1pl_formula <- bf(response ~ 1 + X + (1 | item) + (1 | id))
fit_1pl <- brm(
formula = 1pl_formula,
data = data,
family = brmsfamily("cumulative", "logit"),
iter = 1000,
cores = 4
)

2pl_formula <- bf(response ~ 1 + X + (1 | item) + (1 | id),
disc ~ 1 + (1 | item))
fit_2pl <- brm(
formula = 2pl_formula,
data = data,
family = brmsfamily("cumulative", "logit"),
iter = 1000,
cores = 4
)
``````

Here the model is still not heteroscedastic but I’m anyway more interested in the mean regression of the ability parameter. My initial thought was that predictions could be obtained by building the linear predictor using the estimated ability parameters and population-level covariates (without the difficulty and discrimination parameters or category-specific intercepts, since the focus is on the mean structure of the ability parameter). This works fine and when replicating a model with year spline terms from the paper corresponding to the EM approach, the group-specific trends in the ability parameters are pretty close.

However the baseline category of the covariates, in this case the earliest year, will naturally always be at zero of the ability scale. In the EM implementation above, the output is the mean regression, i.e., the linear predictor for the ability parameter including one global intercept, which represents the baseline category, so this shifts accordingly but thats not the case with the standard IRT.

What am I missing, is this an identification issue or is this a special case that must be implemented directly in Stan?

1 Like

Hi,
sorry for not getting to you earlier - did you manage to resolve the issue in the meantime?

Martin

Yes, I wrote the model in Stan and managed to replicate the results from the paper mentioned above. I don’t think that it is currently possible to build this type of model with brms.

However, it takes quite long to run so I tried to implement the `reduce_sum` parallelization, but I can’t get the model to compile.

Here is the model without `reduce_sum`, data is attached. I used the model posted by @danielcfurr here as basis and the sum-to-zero constraint mentioned here. Running this on Windows 10 (x64, i7-10510, 16GB RAM) took roughly 40 minutes. Even though there are plenty of parameters to be estimated here, this seems somewhat long. I guess there is some potential for making this run faster by rewriting the model. I’d appreciate any suggestions here!

stan_data.Rdata (6.3 KB)

``````library(cmdstanr)

gIRT <- "// GRADED RESPONSE MODEL

data {
int<lower=1> I;                // Number of items
int<lower=1> J;                // Number of respondents
int<lower=1> N;                // Total responses
int<lower=1,upper=I> ii[N];    // Item ID
int<lower=1,upper=J> jj[N];    // Person ID
int<lower=0> y[N];             // Responses
int<lower=1> K;                // Number of covariates for location (mu) of ability (theta)
matrix[J,K] X;                 // Covariate matrix (including intercept) for location (mu) of ability (theta)
}

transformed data {
int r[N];                      // modified response; r = 1, 2, ... m_i + 1
int m[I];                      // number of difficulty parameters per item
int pos_alpha[I];              // first position in difficulty for item
int pos_delta[I];              // first position in between-step change for item

// Allowing for differing numbers of response categories
m = rep_array(0, I);
for(n in 1:N) {
r[n] = y[n] + 1;
if(y[n] > m[ii[n]]) m[ii[n]] = y[n];
}
pos_alpha[1] = 1;
pos_delta[1] = 1;
for(i in 2:(I)) {
pos_alpha[i] = pos_alpha[i-1] + m[i-1];
pos_delta[i] = pos_delta[i-1] + m[i-1] - 1;
}
}

parameters {
vector[I] eta;                      // First diffulty parameter for each item
vector<lower=0>[sum(m)-I] delta;    // Between-step changes in difficulty
real<lower = 0> beta[I];            // Constrain all discrimination parameters to be positive
vector[J] theta;                    // Ability parameter
vector[J-1] mu_free;                // Sum-to-zero constraint for location (mu) of ability (theta)
vector[K] gamma;                    // Covariate coefficients
}

transformed parameters {
vector[J] mu;                    // location of ability (theta)
vector[sum(m)] alpha;            // Composite item difficulties

for(j in 1:(J-1)) mu[j] = mu_free[j]; // Sum-to-zero constraint for location (mu) of ability (theta)
mu[J] = -sum(mu_free);

for(i in 1:I) {
if (m[i] > 0)
alpha[pos_alpha[i]] = eta[i];
for (k in 2:m[i])
alpha[pos_alpha[i]+k-1] = alpha[pos_alpha[i]+k-2] + delta[pos_delta[i]+k-2];
}
}

model{
eta ~ normal(0,1);
delta ~ normal(0,1);
beta ~ lognormal(1,1);
gamma ~ student_t(3, 0, 1);
target += normal_lpdf(alpha | 0, 1);
target += normal_lpdf(mu | X*gamma, 1); // Mean regression
theta ~ normal(mu, 1);

for (n in 1:N) {
r[n] ~ ordered_logistic(theta[jj[n]] * beta[ii[n]],
segment(alpha, pos_alpha[ii[n]], m[ii[n]]));
}
}
"
write(gIRT, "gIRT.stan")

compiled_gIRT <- cmdstan_model(stan_file = "gIRT.stan")

fit_gIRT <- compiled_gIRT\$sample(
data = stan_data,
chains = 4,
parallel_chains = 4,
iter_warmup = 500,
iter_sampling = 100,
refresh = 50
)
``````

And here is my attempt at implementing `reduce_sum`:

``````gIRTrs <- "// GRADED RESPONSE MODEL

functions {
real partial_sum_lpmf(int[] slice_r,
int start, int end,
int[] ii,
int[] jj,
vector theta,
real beta,
vector alpha,
int[] m,
int[] pos_alpha) {
return ordered_logistic_lupmf(slice_r |
theta[jj[start:end]] * beta[ii[start:end]],
segment(alpha, pos_alpha[ii[start:end]],
m[ii[start:end]]));
}
}

data {
int<lower=1> I;                // Number of items
int<lower=1> J;                // Number of respondents
int<lower=1> N;                // Total responses
int<lower=1,upper=I> ii[N];    // Item ID
int<lower=1,upper=J> jj[N];    // Person ID
int<lower=0> y[N];             // Responses
int<lower=1> K;                // Number of covariates for location (mu) of ability (theta)
matrix[J,K] X;                 // Covariate matrix (including intercept) for location (mu) of ability (theta)
int<lower=1> grainsize;
}

transformed data {
int r[N];                      // modified response; r = 1, 2, ... m_i + 1
int m[I];                      // number of difficulty parameters per item
int pos_alpha[I];              // first position in difficulty for item
int pos_delta[I];              // first position in between-step change for item

// Allowing for differing numbers of response categories
m = rep_array(0, I);
for(n in 1:N) {
r[n] = y[n] + 1;
if(y[n] > m[ii[n]]) m[ii[n]] = y[n];
}
pos_alpha[1] = 1;
pos_delta[1] = 1;
for(i in 2:(I)) {
pos_alpha[i] = pos_alpha[i-1] + m[i-1];
pos_delta[i] = pos_delta[i-1] + m[i-1] - 1;
}
}

parameters {
vector[I] eta;                      // First diffulty parameter for each item
vector<lower=0>[sum(m)-I] delta;    // Between-step changes in difficulty
real<lower = 0> beta[I];            // Constrain all discrimination parameters to be positive
vector[J] theta;                    // Ability parameter
vector[J-1] mu_free;                // Sum-to-zero constraint for location (mu) of ability (theta)
vector[K] gamma;                    // Covariate coefficients
}

transformed parameters {
vector[J] mu;                    // location of ability (theta)
vector[sum(m)] alpha;            // Composite item difficulties

for(j in 1:(J-1)) mu[j] = mu_free[j]; // Sum-to-zero constraint for location (mu) of ability (theta)
mu[J] = -sum(mu_free);

for(i in 1:I) {
if (m[i] > 0)
alpha[pos_alpha[i]] = eta[i];
for (k in 2:m[i])
alpha[pos_alpha[i]+k-1] = alpha[pos_alpha[i]+k-2] + delta[pos_delta[i]+k-2];
}
}

model{
eta ~ normal(0,1);
delta ~ normal(0,1);
beta ~ lognormal(1,1);
gamma ~ student_t(3, 0, 1);
target += normal_lpdf(alpha | 0, 1);
target += normal_lpdf(mu | X*gamma, 1); // Mean regression
theta ~ normal(mu, 1);

target += reduce_sum(partial_sum_lupmf, r, grainsize,
ii, jj, theta, beta, alpha, m, pos_alpha);
}
"

write(gIRTrs, "gIRTrs.stan")

compiled_gIRTrs <- cmdstan_model("gIRTrs.stan", cpp_options = list(stan_threads = TRUE))
``````

This throws the following error, which I can not resolve:

``````Semantic error
line 14, column 57 to column 76:
-------------------------------------------------
12:                          int[] pos_alpha) {
13:      return ordered_logistic_lupmf(slice_r |
14:                                    theta[jj[start:end]] * beta[ii[start:end]],
^
15:                                    segment(alpha, pos_alpha[ii[start:end]],
16:                                                    m[ii[start:end]]));
-------------------------------------------------

Too many indexes, expression dimensions=0, indexes found=1
``````

Thanks!

1 Like

I didn’t try running the model, but some general ideas:

This looks very suspicious - ignoring the sum-to zero constraint on `mu` it should be completely equivalent to something like

``````target += normal_lpdf(theta | X*gamma, sqrt(2));
``````

i.e. `mu` probably can be integrated out and having it in the model could lead to problematic posterior geometry (I would expect to see strong correlations between corresponding elements of `mu` and `theta`). Couldn’t you transform the sum-to-zero constraint on `mu` into a constraint on `theta` and don’t treat `mu` as a parameter at all?

This form of sum-to-zero is potentially problematic. There has been quite an extensive discussion on this - probably more than you need to know: Test: Soft vs Hard sum-to-zero constrain + choosing the right prior for soft constrain

TLDR: try using `sum(x) ~ normal(0, <small_number> * <num_lements>)` or the code in this answer in the aformentioned thread + the one below (which is in my experience slightly superior). For more details read the whole topic.

It appears that the problem is that `beta` (as a parameter to `partial_sum_lpmf`) is defined as `real`, but you are indexing it as if it was array/vector. To define it as array, you would have ` real[] beta` in the function signature.

Best of luck with your model!

I tried running it with:

``````theta ~ normal(X*gamma, sqrt(2));
sum(theta) ~ normal(0, 0.0001);
``````

Or

``````target += normal_lpdf(theta | X*gamma, sqrt(2));
sum(theta) ~ normal(0, 0.0001);
``````

But it takes more than twice as long to run (I stopped it at some point).

I also tried real[] beta in the function signature. Now I receive the following error message:

``````Semantic error
line 14, column 34 to column 76:
-------------------------------------------------
12:                          int[] pos_alpha) {
13:      return ordered_logistic_lupmf(slice_r |
14:                                    theta[jj[start:end]] * beta[ii[start:end]],
^
15:                                    segment(alpha, pos_alpha[ii[start:end]],
16:                                                    m[ii[start:end]]));
-------------------------------------------------

Ill-typed arguments supplied to infix operator *. Available signatures:
(int, int) => int
(real, real) => real
(row_vector, vector) => real
(real, vector) => vector
(vector, real) => vector
(matrix, vector) => vector
(real, row_vector) => row_vector
(row_vector, real) => row_vector
(row_vector, matrix) => row_vector
(real, matrix) => matrix
(vector, row_vector) => matrix
(matrix, real) => matrix
(matrix, matrix) => matrix
Instead supplied arguments of incompatible type: vector, array[] real
``````

Are the available signatures for `ordered_logistic_lupmf()` different from `ordered_logistic()`? That beta is an array does not seem to be a problem with `ordered_logistic()`.

Should I open a new topic for the `reduce_sum` issue?

Just tried:

``````theta ~ normal(X*gamma,1);
mean(theta) ~ normal(0, 0.001);
``````

Yields the same results as the initial sum-to-zero constraint but faster (24 minutes rather than 40) and `fit_gIRT\$cmdstan_diagnose()` indicates no problems, the initial version had large R-hat values.

Or is this form of constraint also potentially problematic?

Still stuck on the `reduce_sum` issue, though.

1 Like

The problem is not `ordered_logistic_xxx` but rather that you are trying to multiple a real array with a vector. This is not allowed, because this can correspond to several different operations, so you need to be explicit. If you want element-wise multiplication, you need to use the `.*` operator. If you want the inner/outer product, you need to have a `vector` and `row_vector` as operators (you can either change the type in the signature or use `to_vector` / `to_row_vector`).

There are a couple things to learn from the fact that `sum(theta) ~ normal(0, 0.0001);` poses problems while `mean(theta) ~ normal(0, 0.001);` does not. Generally this form of soft sum-to-zero constraint can induce problematic posterior geometry as you have one more parameter than strictly needed and your posterior concentrates around a hyperplane in the full space. Such problems get more pronoucned the tighter the constraint is. The second constraint is much more permissive and hence probably avoids some of the pathologies, but it is also more likely the results actually violate the constraint substantially (easy to check, just look at the value of `sum(theta)` for each posterior sample). If you can live with this imprecision, then definitely no need to change your model.

If however the constraint is too permissible, then the best way would likely be to parametrize the model in a way that `sum(theta) == 0`, or at least `sum(X * gamma) == 0` holds by cosntruction.

An outline of how one could construct such `gamma` and `theta`. Notably, `sum(X * gamma) == 0` defines a `K - 1` dimensional space (the kernel of `sum(X * gamma)`, treated as a map from `R^K` to `R`), so for any given `X` you should be able to find a matrix `B` representing a suitable basis such that you can have:

``````transformed data {
matrix[K, K - 1] B = //some linear algebra I am too lazy to figure out
}

parameters {
vector[K - 1] gamma_raw;
...
}

transformed parameters {
vector[K] gamma = B * gamma_raw;
}
``````

and have `sum(X * gamma) == 0` for any value of `gamma_raw`. Now we can have `theta = X * gamma + theta_z` where `sum(theta_z) == 0` and you could likely gain some efficiency by constraining `theta_z` with the usually more efficient sum-to-zero constraint described at Test: Soft vs Hard sum-to-zero constrain + choosing the right prior for soft constrain - #31 by andre.pfeuffer .

Hope that clarifies more then confuses

1 Like

Hi,

I have a follow-up question on this model. At the moment the scale of `theta` is pinned to 1 to deal with non-identifiability. This works well, even with the soft sum-to-zero constraint, and `reduce_sum` helps a lot to speed this up, although I’m struggling a bit with optimizing this further via vectorization (see Using `reduce_sum` with `ordered_logistic` - #13 by saschagobel).

As a next step, I’d like to model the variance as well, so that `theta` is modeled as:

The following won’t work, because scale invariance is not addressed and the model is not identified (the covariate matrix is the same for the location and scale):

``````theta ~ normal(X*gamma, exp(X*lambda));
``````

In the paper mentioned above, the author suggests an additional sum-to-zero constraint on sigma, so that:

and “the geometric mean of the prior variances of the latent preferences equals one”.

I’m not sure how to implement this in Stan, though.

I tried the following but this yields a few divergent transitions, split R-hat greater than 1.05 on many parameters, and the results are clearly off.

``````// Just the additions

parameters {
...
vector[K] lambda;
vector<lower=0>[J] sigma;
}

model{
...
log(sigma) ~ normal(X*lambda, 1);
mean(sigma) ~ normal(0, 0.001 * J);
theta ~ normal(X*gamma, sigma);
mean(theta) ~ normal(0, 0.001 * J);
...
}
``````

You may also try treating some groups of parameters as known (because you simulate the data and know true values) and pass those in the `data` block. If just one element of the model is causing trouble, you’ll get a well-behaved model when you fix the relevant parameter/group of parameters.