N_eff drops by >10x when modeling mu

Hey all,

I’m trying to fit an IRT model for a set of psychosis screeners, with psychosis severity as the latent “ability”. My stan code looks like this:

data{
int<lower=1> I;               // # of questions
int<lower=1> J;               // # of persons
int<lower=1> N;               // # of observations
int<lower=1> K;               // # of diagnostic groups 
int<lower=1, upper=I> ii[N];  // question for observation n
int<lower=1, upper=J> jj[N];  // person for n
int<lower=0, upper=1> y[N];   // response for n
int<lower=1, upper=K> status[J]; // diagnostic status for person J
}
parameters {
vector[I] difficulty;           // 'difficulty' for item i
vector[I] discriminativeness;   // discrimination for item i
real mu_disc;                   // average item discrimination
real<lower=0> sigma_disc;       // sd for item discrimiation
real<lower=0> sigma_difficulty; // sd for item difficulty
real a_severity;                // severity intercept (like 'ability' in standard IRT models)
vector[K] status_effect;        // effect of diagnosis on severity
vector[J] epsilon;              // severity for person j
}

transformed parameters {
vector[J] severity;

for (j in 1:J){
severity[j] = a_severity + status_effect[status[j]] + epsilon[j]; 
}
}

model {
sigma_difficulty ~ normal(0,1);
difficulty ~ normal(0,sigma_difficulty); 

mu_disc ~ normal(0,10);
sigma_disc ~ normal(0,1);
discriminativeness ~ normal(mu_disc,sigma_disc); 

status_effect ~ normal(0,2);
a_severity ~ normal(0,5);
epsilon ~ normal(0,4);

y ~ bernoulli_logit( (discriminativeness[ii] .* severity[jj])  - difficulty[ii]);
}

This code runs reasonably fast and doesn’t give me any warnings, although there’s a lot of uncertainty in the estimates.

The distribution of n_eff values for all the paramaters look like this:

However when I try to model average item difficulty as a hyperparameter (replacing difficulty ~ normal(0,sigma_difficulty); with difficulty ~ normal(mu_diff,sigma_difficulty); ), suddenly the n_eff for all the item I difficulty and person J severity parameters drops by a factor of greater than 10 (e.g., from 765 to 37 on 2000 samples).

The chains for difficulty and severity parameters suddenly look terrible (they look fine in the previous model).


I am wondering if anyone might have an idea what’s going on / a solution to this?

The stan code with mu difficulty looks like this:

data{
int<lower=1> I;               // # of questions
int<lower=1> J;               // # of persons
int<lower=1> N;               // # of observations
int<lower=1> K;               // # of diagnostic groups 
int<lower=1, upper=I> ii[N];  // question for observation n
int<lower=1, upper=J> jj[N];  // person for n
int<lower=0, upper=1> y[N];   // response for n
int<lower=1, upper=K> status[J]; // diagnostic status for person J
}
parameters {
vector[I] difficulty;           // 'difficulty' for item i
vector[I] discriminativeness;   // discrimination for item i
real mu_disc;                   // average item discrimination
real<lower=0> sigma_disc;       // sd for item discrimiation
real mu_diff;                   // average item difficulty
real<lower=0> sigma_difficulty; // sd for item difficulty
real a_severity;                // severity intercept (like 'ability' in standard IRT models)
vector[K] status_effect;        // effect of diagnosis on severity
vector[J] epsilon;              // severity for person j
}

transformed parameters {
vector[J] severity;
for (j in 1:J){
severity[j] = a_severity + status_effect[status[j]] + epsilon[j]; 
}

}

model {
mu_diff ~ normal(0,10);
sigma_difficulty ~ normal(0,1);
difficulty ~ normal(mu_diff,sigma_difficulty); 

mu_disc ~ normal(0,10);
sigma_disc ~ normal(0,1);
discriminativeness ~ normal(mu_disc,sigma_disc); 

status_effect ~ normal(0,2);
a_severity ~ normal(0,5);
epsilon ~ normal(0,4);

y ~ bernoulli_logit( (discriminativeness[ii] .* severity[jj])  - difficulty[ii]);
}

Thank you very much!

I suspect attempting inference on the mean difficulty has induced an non-identifiability in your model. These tend to crop up in IRTs given the multiple additive latent parameters. In your case, deviations from zero in mean difficulty are cancelled out by equal and opposite deviations in the discriminativeness[ii] .* severity[jj] term. In tracing back that term’s components, I think the issue is then that a_severity and your mean difficulty parameter are achieving the same thing. Check the pairs plot for these two and I bet they’re strongly negatively correlated.

1 Like

That makes sense and could be what’s going on! I just checked and the a_severity and mu_diff samples are correlated at r=.8.

When I remove the a_severity parameter and leave in mu_diff I still get much fewer effective samples (less than 10% of the total sample size) for the difficulty and severity parameters than when mu_diff is left out. But the chains look much better and there are no other warnings, so maybe this is good enough.

Thank you!

For complex problems having effective sample size / iteration around 0.1 is perfectly reasonable provided that all of the other diagnostics (Rhat, ESS / tier < 0.001, E-BFMI, divergences) are clean.

Got it. Thank you! That seems to be the case for the model that drops the a_severity parameter but includes mu_diff. I really appreciate your help.