Max Treedepth Exceeded = Lack of Convergence?

I am trying to run the ordered logit model pasted below and am running into the following issue.

All transitions after warmup exceed a maximum treedepth of 12. This is the only warning I am getting after I fit the model, yet the Rhats are incredibly high and n_eff is 2, so I don’t believe the model converged.

When launching shinystan, I also see that the traceplots do not look like the chains mixed and the stepsize parameter is zero for all chains.

What might be wrong with the model such that it is not converging even though there were no divergent transitions? I thought that exceeding max treedepth was only a computational efficiency issue, not an issue of convergence.

Thanks so much for any tips you may have.

data{
int<lower=0> n; // number of data points
int<lower=0> j; // number of item parameters
int<lower=2> k; // number of cutpoints

int y1[n];
int y2[n];
int y3[n];
int y4[n];

}

parameters{
vector[j] alpha; // item difficulty parameters
real<lower=0> beta[j]; // item discrimination parameters
vector[n] theta; // latent variables
real<lower=0> sigma[j]; // standard deviations
ordered[k-1] c1; // cutpoints
ordered[k-1] c2; // cutpoints
ordered[k-1] c3; // cutpoints
ordered[k-1] c4; // cutpoints
}

model{
theta ~ normal(0, 1); // fixing latent variable scale
alpha ~ normal(0, 10); // prior on item difficulty
beta ~ gamma(4, 3); // prior on item discrimination

y1 ~ ordered_logistic(alpha[1] + beta[1] * theta, c1);
y2 ~ ordered_logistic(alpha[2] + beta[2] * theta, c2);
y3 ~ ordered_logistic(alpha[3] + beta[3] * theta, c3);
y4 ~ ordered_logistic(alpha[4] + beta[4] * theta, c4);
}

1 Like

sigma is not used in the model block at all. Stan cannot sample from the implicit flat prior and that messes up the whole inference.

1 Like

That was mistakenly in there from a previous parameterization of the model and I forgot to take it out - would that be enough to throw off the results?

Also some of the alpha and beta parameters are not involved when j != 4. But more importantly, beta and theta are likely not well identified - you can increase theta and decrease beta by a corresponding amount to arrive at (almost) exactly the same likelihood. This should be visible when plotting some betas against thetas (with pairs plot or shinystan).

Fixing beta[1] to 1 or something similar might help with this (not thinking it through, short on time, please double check).

Best of luck with your model!

Thanks for the suggestion - I thought that the standard normal prior on theta properly identified the model? Is there something else I should do in order to further correct for potential unidentifiabilities?

Which version of Stan are you using? There’s a bug in the ordered_logistic distribution that’s present in version 2.19. You’ll likely be effected if you’re using RStan, since that’s the current version available.

If you’re using version 2.19, there’s a fix listed here

1 Like

The bug affects only the vectorized function, right? Probably easier fix is to just write the loop explicitly

for (n in 1:N)
y1[n] ~ ordered_logistic(alpha[1] + beta[1] * theta[n], c1);


etc.

Good call, yep that would be the simpler way to avoid the bug rather than dealing with the c++

I’m in 2.18.2 - is there a bug in this as well? Would the model below correct things up?

data{
int<lower=0> n; // number of data points
int<lower=0> j; // number of item parameters
int<lower=2> k; // number of cutpoints

int y1[n];
int y2[n];
int y3[n];
int y4[n];

}

parameters{
vector[j] alpha; // item difficulty parameters
real<lower=0> beta[j]; // item discrimination parameters
vector<lower=0, upper=1>[n] theta; // latent variables
ordered[k-1] c1; // cutpoints
ordered[k-1] c2; // cutpoints
ordered[k-1] c3; // cutpoints
ordered[k-1] c4; // cutpoints
}

model{
theta ~ normal(0, 1); // fixing latent variable scale
alpha ~ normal(0, 10); // prior on item difficulty
beta ~ gamma(4, 3); // prior on item discrimination

for(i in 1:n){
y1[i] ~ ordered_logistic(alpha[1] + beta[1] * theta[i], c1);
y2[i] ~ ordered_logistic(alpha[2] + beta[2] * theta[i], c2);
y3[i] ~ ordered_logistic(alpha[3] + beta[3] * theta[i], c3);
y4[i] ~ ordered_logistic(alpha[4] + beta[4] * theta[i], c4);
}

}


I’ve run into this recently and my issue was that my priors were putting variables into ‘bad spaces’ in the geometry which forced the sampler to have a much deeper tree depth. The default priors are uniform (?-inf, inf or some huge numbers?). If those priors put your model’s parameters in weird spaces then I could see that being an issue for the c{N} variables

So you are saying I should put priors on the cutpoint variables? How would I know that my variables are in “bad spaces”? Thanks for the tip.

I’ve never worked with cutpoint models so I can’t say what priors would be good for them, but for diagnosing stuff I find bayesplots vignettes to be helpful

https://mc-stan.org/bayesplot/articles/visual-mcmc-diagnostics.html

The bug is specific to 2.19, so you should be alright.

As for priors on the cutpoints, I’ve success with normal(0,5) priors in the past

Okay, normal priors on the cutpoints. Updated Stan and looping over the ordered logits to bypass the bug.
I’ve just compiled the model, and it says that 1000 transitions will take 377.44, 626.25, 637.21, and 597.89 seconds respectively per chain. This seems really high - did the introduction of the loop make the computation more time-consuming?

An intercept on the latent affinities, your alpha parameters here, are not identified relative to the cut points. If you want to have both then both need informative priors. For more see https://betanalpha.github.io/assets/case_studies/ordinal_regression.html.

1 Like

If I am only interested in the alpha parameters is the model okay as is?

No – because the alpha are correlated with the other parameters in the model you need to be able to quantify all of them accurately in order to get accurate quantifications of the marginal posterior for the alpha.

Another way of thinking about it is that the degeneracy between the alpha and the cut points means that an experiment with ordinal outputs will not answer the questions in which you are interested, at least not with out substantial prior information to complement the information in that ordinal data and distinguish between the many degenerate possibilities.

I see, how can I correct this unidentifiability? What is the relationship between alpha and c that needs to be specified? Thanks so much for your help.

I discuss some possibilities in the ordinal regression case study linked above. In general the answer will depend on the bespoke structure of your application.

I see, I had read your study when you first posted it and it wasn’t clear to me why adding the generated quantities block solved the problem. I still am unsure how I would mend my model to fix the issue. But if I were to remove the alpha parameter entirely and only use \beta * \theta as item discrimination times the latent quantity would we have the same specification issue? Or would this be solved?