I am trying to run the ordered logit model pasted below and am running into the following issue.
All transitions after warmup exceed a maximum treedepth of 12. This is the only warning I am getting after I fit the model, yet the Rhats are incredibly high and n_eff is 2, so I don’t believe the model converged.
When launching shinystan, I also see that the traceplots do not look like the chains mixed and the stepsize parameter is zero for all chains.
What might be wrong with the model such that it is not converging even though there were no divergent transitions? I thought that exceeding max treedepth was only a computational efficiency issue, not an issue of convergence.
Thanks so much for any tips you may have.
data{
int<lower=0> n; // number of data points
int<lower=0> j; // number of item parameters
int<lower=2> k; // number of cutpoints
int y1[n];
int y2[n];
int y3[n];
int y4[n];
}
parameters{
vector[j] alpha; // item difficulty parameters
real<lower=0> beta[j]; // item discrimination parameters
vector[n] theta; // latent variables
real<lower=0> sigma[j]; // standard deviations
ordered[k-1] c1; // cutpoints
ordered[k-1] c2; // cutpoints
ordered[k-1] c3; // cutpoints
ordered[k-1] c4; // cutpoints
}
model{
theta ~ normal(0, 1); // fixing latent variable scale
alpha ~ normal(0, 10); // prior on item difficulty
beta ~ gamma(4, 3); // prior on item discrimination
y1 ~ ordered_logistic(alpha[1] + beta[1] * theta, c1);
y2 ~ ordered_logistic(alpha[2] + beta[2] * theta, c2);
y3 ~ ordered_logistic(alpha[3] + beta[3] * theta, c3);
y4 ~ ordered_logistic(alpha[4] + beta[4] * theta, c4);
}
That was mistakenly in there from a previous parameterization of the model and I forgot to take it out - would that be enough to throw off the results?
Also some of the alpha and beta parameters are not involved when j != 4. But more importantly, beta and theta are likely not well identified - you can increase theta and decrease beta by a corresponding amount to arrive at (almost) exactly the same likelihood. This should be visible when plotting some betas against thetas (with pairs plot or shinystan).
Fixing beta[1] to 1 or something similar might help with this (not thinking it through, short on time, please double check).
Thanks for the suggestion - I thought that the standard normal prior on theta properly identified the model? Is there something else I should do in order to further correct for potential unidentifiabilities?
Which version of Stan are you using? There’s a bug in the ordered_logistic distribution that’s present in version 2.19. You’ll likely be effected if you’re using RStan, since that’s the current version available.
I’ve run into this recently and my issue was that my priors were putting variables into ‘bad spaces’ in the geometry which forced the sampler to have a much deeper tree depth. The default priors are uniform (?-inf, inf or some huge numbers?). If those priors put your model’s parameters in weird spaces then I could see that being an issue for the c{N} variables
I’ve never worked with cutpoint models so I can’t say what priors would be good for them, but for diagnosing stuff I find bayesplots vignettes to be helpful
Okay, normal priors on the cutpoints. Updated Stan and looping over the ordered logits to bypass the bug.
I’ve just compiled the model, and it says that 1000 transitions will take 377.44, 626.25, 637.21, and 597.89 seconds respectively per chain. This seems really high - did the introduction of the loop make the computation more time-consuming?
No – because the alpha are correlated with the other parameters in the model you need to be able to quantify all of them accurately in order to get accurate quantifications of the marginal posterior for the alpha.
Another way of thinking about it is that the degeneracy between the alpha and the cut points means that an experiment with ordinal outputs will not answer the questions in which you are interested, at least not with out substantial prior information to complement the information in that ordinal data and distinguish between the many degenerate possibilities.
I see, how can I correct this unidentifiability? What is the relationship between alpha and c that needs to be specified? Thanks so much for your help.
I discuss some possibilities in the ordinal regression case study linked above. In general the answer will depend on the bespoke structure of your application.
I see, I had read your study when you first posted it and it wasn’t clear to me why adding the generated quantities block solved the problem. I still am unsure how I would mend my model to fix the issue. But if I were to remove the alpha parameter entirely and only use \beta * \theta as item discrimination times the latent quantity would we have the same specification issue? Or would this be solved?