IRT identification | Constraining difficulty and discrimination parameters

To identify a unidimensional 2PL IRT model with unconstrained mean and variance parameters of the latent trait, we can fix one difficulty parameter to 0 (for the unconstrained mean) and one discrimination parameter to 1 (for the unconstrained variance).

But there’s also a different approach, advocated i.a. by Fox (2010, p. 88-89), i.e.
to constrain the sum of difficulty parameters to 0:

\mu_{\beta} = \sum_{k} \beta_{k} / K , where K is the number of (binary) items

\widetilde{\beta}_{k} = \beta_{k} - \mu_{\beta}

and the product of discrimination parameters to 1:

\sigma_{\alpha} = \prod_k \alpha_{k}

\widetilde{\alpha}_{k} = \alpha_{k} (1/\sigma_{\alpha})^{1/K}

I have two questions.

First, following this topic on sum-to-zero constraints, and this bit of code specifically:

transformed data {
  vector[2*N] Q_r = Q_sum_to_zero_QR(N);
  real x_raw_sigma = inv_sqrt(1 - inv(N));
}

parameters {
   vector[N - 1]  x_raw;
}

transformed parameters {
  vector[N] x = sum_to_zero_QR(x_raw, Q_r);
}

model {
  x_raw ~ normal(0, x_raw_sigma);
}

is there a way of specifying priors on the elements of x? I was just wondering how to translate the priors of the difficulty parameters from the “typical” approach, say:

beta ~ normal(0, 3);

into the approach with a centered vector? I’d be grateful for some code examples.

Second, following the discussion on constraining the vector of parameters by their product , is there a way of imposing that constraint on the discrimination parameters so that their product is equal to 1? Again, I’d be grateful for some code examples.

Thank you!

You might find it easier to avoid all such constraints, use enough priors to ensure the parameters are at least vaguely identified, then compute a standardised set of parameters in the generated quantities or post processing. Just an idea.

First sum to zero constraints are not recommended anymore. There is a discussion
why with Michael Betancourt and others in the forum.
(apologize to not providing a full list of participants)
Details about centered vs non-centered parameterization can be found here (or elsewhere):
https://mc-stan.org/docs/stan-users-guide/reparameterization.html

Non-centered approach

parameters {
   vector[N - 1]  x_raw;
   real<lower = 0> sigma;
}
transformed parameters {
  vector[N] x = sum_to_zero_QR(x_raw, Q_r) * sigma;
}

model {
  x_raw ~ normal(0, 1);
  sigma ~ student_t(4, 0, 10); // or other any prior
}

One may change x in a questionable way to
vector[N] x = sum_to_zero_QR(x_raw, Q_r) * sigma * 3.0;

Centered approach

functions {
  vector Q_sum_to_zero_QR(int N) {
    vector [2*N] Q_r;

    for(i in 1:N) {
      Q_r[i] = -sqrt((N-i)/(N-i+1.0));
      Q_r[i+N] = inv_sqrt((N-i) * (N-i+1));
    }
    //Q_r = Q_r * inv_sqrt(1 - inv(N));
    return Q_r;
  }
  vector sum_to_zero_QR(vector x_raw, vector Q_r) {
    int N = num_elements(x_raw) + 1;
    vector [N] x;
    real x_aux = 0;

    for(i in 1:N-1){
      x[i] = x_aux + x_raw[i] * Q_r[i];
      x_aux = x_aux + x_raw[i] * Q_r[i+N];
    }
    x[N] = x_aux;
    return x;
  }
}
data {
  int<lower=0> K;
  int<lower=0> N;
  real mu;
  real<lower=0> sigma;
}
transformed data {
  vector[2 * K] Q_alpha = Q_sum_to_zero_QR(K);
}
parameters {
  vector[K - 1] alpha_pre;
}
transformed parameters {
  vector[K] alpha = mu + sum_to_zero_QR(alpha_pre, Q_alpha);
}
model {
  alpha ~ normal(mu, sigma * inv_sqrt(1 - inv(K)));
  //mu ~ normal(1, 10);
}

[/quote]

library(cmdstanr)
library(rstan)

N <- 100000
K <- 2
data = list(
  K = K
, N = N
, mu = 2
, sigma = 1)


mod <- cmdstan_model('sumtozero_centered.stan')
fit <- mod$sample(
  data = data 
, chains = 4
, seed = 12345
, parallel_chains = 4
, iter_warmup = N
, iter_sampling = N 
)

lm_waic <- rstan::read_stan_csv(fit$output_files())
extr<-extract(lm_waic)
sd(as.vector(extr$alpha))
mean(as.vector(extr$alpha))

Again one may multiply the sd by a factor eg. 3.0 like this:

alpha ~ normal(mu, sigma * inv_sqrt(1 - inv(K)) * 3.0);

Thank you! That’s really helpful!

Do you, by any chance, know how to implement the constraint with the product for the discrimination parameters (which we constrain to be positive)?

Maybe I should tag @jsocolar because I’m referring to his post from this discussion: Constrain the parameters by their sum/product - #5 by jsocolar

transformed data {
  vector[2 * K] Q_alpha = Q_sum_to_zero_QR(K);
}
parameters {
  vector<lower = 0 >[K-1] alpha_pre;
}
transformed parameters {
  vector[K] alpha = sum_to_zero_QR(alpha_pre, Q_alpha);
}
model {
  alpha_pre ~ std_normal();
}

How about taking the product of exp(alpha)?
This should be 1, but inflicts some rounding problems.
It assumes each exp(alpha) > 0.

I want to address a big warning to this. Although the maths may be ok, the sampler
may not.

Thanks! I’ll give it a try.

An alternative would be to constrain the vector of discrimination parameters to average 1.0 (here the reference). But I’m not sure if that would make the estimation easier.