Trying to improve stability of ordinal logistic regression

I am trying to fit an ordinal model with twist. At this point, I am using only simulated data, so things should be working well. The context is a randomized control trial, with randomization occuring in multiple sites. The twist is that there are three different interventions which are quite similar but not exact. Each site has chooses one intervention. So, I have simulated 9 sites, 3 with intervention A, 3 with intervention B, and 3 with intervention C. I am actually interested in a pooled treatment effect (across all sites and interventions). I model individual site-level effects and intervention-type effects. And since this is ordinal, I need to estimate cut-points - I actually have specified site-specific cut-points rather than using a site-specific random effect, because we don’t want to force a proportional odds assumption across sites, only within a site. I think that should be enough information to understand the model.

My question: I am getting “divergence” messages and I want to know if there is a better way for me to parameterize this model to smooth things out a bit. The posterior plots look pretty reasonable, but the trace plots for the treatment effects have some clear spikes. Any suggestions would be much appreciated.

Here is the stan code:

data {
  int<lower=0> N;                // number of observations
  int<lower=2> L;                // number of ordinal categories
  int<lower=1> K;                // number of sites
  int<lower=1,upper=L> y[N];     // vector of categorical outcomes
  int<lower=1,upper=K> kk[N];    // site for individual
  int<lower=0,upper=1> ctrl[N];  // treatment or control
  int<lower=1,upper=3> cc[K];    // specific control for site
}

parameters {
  ordered[L-1] tau[K];      // cut-points for cumulative odds model
  
  vector[K] delta_k;        // site specific treatment effect
  real<lower=0>  eta_0;     // sd of delta_k (around delta)
  
  vector[3] delta;          // control-specific effect
  real<lower=0> eta;        // sd of delta (around Delta)
  
  real Delta;               // overall control effect
}

transformed parameters{ 
  
  vector[N] yhat;
  
  for (i in 1:N)  
      yhat[i] = ctrl[i] * delta_k[kk[i]];
}

model {
  
  // priors
  
  eta_0 ~ cauchy(0, 25);
  eta ~ cauchy(0, 25);
  
  for (k in 1:K)
    delta_k[k] ~ normal(delta[cc[k]], eta_0);
  
  for (c in 1:3)
    delta[c] ~ normal(Delta, eta);
    
  Delta ~ normal(0, 1);
  
// I also used an induced_dirichlet approach here with
// very similar results:

  for (k in 1:K)
    for (l in 1:(L-1))
      tau[k, l] ~ uniform(-5, 5);     
  
  // outcome model
  
  for (i in 1:N)
    y[i] ~ ordered_logistic(yhat[i], tau[kk[i]]);
}

If it would help for me to submit the simulation code, I’d be happy to do that - it is a little bit involved, but not too bad.

Here are the trace plots for the treatment effects:

trace

And the trace plots for the variance parameters:

variance

And finally - the density plots for the cut-points (tau in the model) - one panel for each site, and there are 5 levels in the ordinal outcome, hence 4 cut-points:

cutpoints

1 Like

Sorry, your question fell through a bit (despite being well written and relevant).

First - if you are using R, I would strongly recommend you give brms a try - it should support the model you are trying to develop out of the box and has already implemented a ton of useful stuff (predictions, integration with loo, …).

If on the other hand your goal is to learn Stan (or you use a different host language), then I would suggest you to check out the Visualisation vignette - which shows how to generate some interesting plots to diagnose your divergences (and you should be able to reproduce those plots even in other languages with some work).

My main suspicions would be

  • the centered parametrization for the varying effects (the bayesplot vignette discusses this or you can search “non-centered parametrization” here)
  • very wide priors for eta_0 and eta - do you really believe your effects can be that big? Remember the effects are basically on something like a log-odds scale
  • the uniform prior on tau - since tau is defined to span the whole real numbers, this will lead to values “out of bounds” of the uniform prior. You may safely have something like brms does (tau[k] ~ student_t(3, 0, 2.5);. There is no ordered AND constrained parameter by default in Stan, but if you insist on having it, it can be built from simplex - taking the cumulative sum of a simplex parameter gives you ordered values in the [0,1] interval which can be rescaled. But I would not recommend this.

Best of luck with your model!