Max Treedepth Exceeded = Lack of Convergence?

jmelnick · January 10, 2020, 6:34am

I am trying to run the ordered logit model pasted below and am running into the following issue.

All transitions after warmup exceed a maximum treedepth of 12. This is the only warning I am getting after I fit the model, yet the Rhats are incredibly high and n_eff is 2, so I don’t believe the model converged.

When launching shinystan, I also see that the traceplots do not look like the chains mixed and the stepsize parameter is zero for all chains.

What might be wrong with the model such that it is not converging even though there were no divergent transitions? I thought that exceeding max treedepth was only a computational efficiency issue, not an issue of convergence.

Thanks so much for any tips you may have.

data{
  int<lower=0> n; // number of data points
  int<lower=0> j; // number of item parameters
  int<lower=2> k; // number of cutpoints
  
  int y1[n]; 
  int y2[n]; 
  int y3[n]; 
  int y4[n]; 

}

parameters{
  vector[j] alpha; // item difficulty parameters
  real<lower=0> beta[j]; // item discrimination parameters
  vector[n] theta; // latent variables
  real<lower=0> sigma[j]; // standard deviations
  ordered[k-1] c1; // cutpoints 
  ordered[k-1] c2; // cutpoints 
  ordered[k-1] c3; // cutpoints 
  ordered[k-1] c4; // cutpoints 
}

model{
  theta ~ normal(0, 1); // fixing latent variable scale
  alpha ~ normal(0, 10); // prior on item difficulty
  beta ~ gamma(4, 3); // prior on item discrimination
  
  y1 ~ ordered_logistic(alpha[1] + beta[1] * theta, c1); 
  y2 ~ ordered_logistic(alpha[2] + beta[2] * theta, c2); 
  y3 ~ ordered_logistic(alpha[3] + beta[3] * theta, c3);
  y4 ~ ordered_logistic(alpha[4] + beta[4] * theta, c4); 
}

nhuurre · January 10, 2020, 6:43am

sigma is not used in the model block at all. Stan cannot sample from the implicit flat prior and that messes up the whole inference.

jmelnick · January 10, 2020, 6:48am

That was mistakenly in there from a previous parameterization of the model and I forgot to take it out - would that be enough to throw off the results?

martinmodrak · January 10, 2020, 6:49am

Also some of the alpha and beta parameters are not involved when j != 4. But more importantly, beta and theta are likely not well identified - you can increase theta and decrease beta by a corresponding amount to arrive at (almost) exactly the same likelihood. This should be visible when plotting some betas against thetas (with pairs plot or shinystan).

Fixing beta[1] to 1 or something similar might help with this (not thinking it through, short on time, please double check).

Best of luck with your model!

jmelnick · January 10, 2020, 7:05am

Thanks for the suggestion - I thought that the standard normal prior on theta properly identified the model? Is there something else I should do in order to further correct for potential unidentifiabilities?

andrjohns · January 10, 2020, 8:01am

Which version of Stan are you using? There’s a bug in the ordered_logistic distribution that’s present in version 2.19. You’ll likely be effected if you’re using RStan, since that’s the current version available.

If you’re using version 2.19, there’s a fix listed here

nhuurre · January 10, 2020, 8:08am

The bug affects only the vectorized function, right? Probably easier fix is to just write the loop explicitly

for (n in 1:N)
  y1[n] ~ ordered_logistic(alpha[1] + beta[1] * theta[n], c1);

etc.

andrjohns · January 10, 2020, 8:12am

Good call, yep that would be the simpler way to avoid the bug rather than dealing with the c++

jmelnick · January 10, 2020, 5:14pm

I’m in 2.18.2 - is there a bug in this as well? Would the model below correct things up?

data{
  int<lower=0> n; // number of data points
  int<lower=0> j; // number of item parameters
  int<lower=2> k; // number of cutpoints
  
  int y1[n]; 
  int y2[n]; 
  int y3[n]; 
  int y4[n]; 

}

parameters{
  vector[j] alpha; // item difficulty parameters
  real<lower=0> beta[j]; // item discrimination parameters
  vector<lower=0, upper=1>[n] theta; // latent variables
  ordered[k-1] c1; // cutpoints 
  ordered[k-1] c2; // cutpoints
  ordered[k-1] c3; // cutpoints 
  ordered[k-1] c4; // cutpoints 
}

model{
  theta ~ normal(0, 1); // fixing latent variable scale
  alpha ~ normal(0, 10); // prior on item difficulty
  beta ~ gamma(4, 3); // prior on item discrimination
  
  for(i in 1:n){
  y1[i] ~ ordered_logistic(alpha[1] + beta[1] * theta[i], c1); 
  y2[i] ~ ordered_logistic(alpha[2] + beta[2] * theta[i], c2);
  y3[i] ~ ordered_logistic(alpha[3] + beta[3] * theta[i], c3); 
  y4[i] ~ ordered_logistic(alpha[4] + beta[4] * theta[i], c4); 
  }
  
}

stevebronder · January 10, 2020, 5:54pm

I’ve run into this recently and my issue was that my priors were putting variables into ‘bad spaces’ in the geometry which forced the sampler to have a much deeper tree depth. The default priors are uniform (?-inf, inf or some huge numbers?). If those priors put your model’s parameters in weird spaces then I could see that being an issue for the c{N} variables

jmelnick · January 10, 2020, 6:29pm

So you are saying I should put priors on the cutpoint variables? How would I know that my variables are in “bad spaces”? Thanks for the tip.

stevebronder · January 10, 2020, 8:09pm

I’ve never worked with cutpoint models so I can’t say what priors would be good for them, but for diagnosing stuff I find bayesplots vignettes to be helpful

https://mc-stan.org/bayesplot/articles/visual-mcmc-diagnostics.html

andrjohns · January 11, 2020, 2:33am

The bug is specific to 2.19, so you should be alright.

As for priors on the cutpoints, I’ve success with normal(0,5) priors in the past

jmelnick · January 11, 2020, 10:14pm

Okay, normal priors on the cutpoints. Updated Stan and looping over the ordered logits to bypass the bug.
I’ve just compiled the model, and it says that 1000 transitions will take 377.44, 626.25, 637.21, and 597.89 seconds respectively per chain. This seems really high - did the introduction of the loop make the computation more time-consuming?

betanalpha · January 12, 2020, 5:36pm

An intercept on the latent affinities, your alpha parameters here, are not identified relative to the cut points. If you want to have both then both need informative priors. For more see https://betanalpha.github.io/assets/case_studies/ordinal_regression.html.

jmelnick · January 12, 2020, 6:03pm

If I am only interested in the alpha parameters is the model okay as is?

betanalpha · January 12, 2020, 8:56pm

No – because the alpha are correlated with the other parameters in the model you need to be able to quantify all of them accurately in order to get accurate quantifications of the marginal posterior for the alpha.

Another way of thinking about it is that the degeneracy between the alpha and the cut points means that an experiment with ordinal outputs will not answer the questions in which you are interested, at least not with out substantial prior information to complement the information in that ordinal data and distinguish between the many degenerate possibilities.

jmelnick · January 12, 2020, 9:29pm

I see, how can I correct this unidentifiability? What is the relationship between alpha and c that needs to be specified? Thanks so much for your help.

betanalpha · January 13, 2020, 2:30am

I discuss some possibilities in the ordinal regression case study linked above. In general the answer will depend on the bespoke structure of your application.

jmelnick · January 13, 2020, 3:05am

I see, I had read your study when you first posted it and it wasn’t clear to me why adding the generated quantities block solved the problem. I still am unsure how I would mend my model to fix the issue. But if I were to remove the alpha parameter entirely and only use \beta * \theta as item discrimination times the latent quantity would we have the same specification issue? Or would this be solved?

Topic		Replies	Views
Ordinal IRT model Modeling fitting-issues	9	2283	December 15, 2017
Trouble converting JAGS IRT model to Stan Modeling rstan , irt , hierarchical-model	3	84	July 28, 2025
High magnitude cutpoint proposals Modeling fitting-issues , performance	14	1198	March 16, 2018
Help with hierarchically modeling parameters of a nonlinear model Modeling fitting-issues , specification	10	164	September 8, 2024
Max treedepth warnings in hierarchical multivariate model Modeling	10	1923	November 17, 2018

Max Treedepth Exceeded = Lack of Convergence?

Related topics