# Trying to improve stability of ordinal logistic regression

I am trying to fit an ordinal model with twist. At this point, I am using only simulated data, so things should be working well. The context is a randomized control trial, with randomization occuring in multiple sites. The twist is that there are three different interventions which are quite similar but not exact. Each site has chooses one intervention. So, I have simulated 9 sites, 3 with intervention A, 3 with intervention B, and 3 with intervention C. I am actually interested in a pooled treatment effect (across all sites and interventions). I model individual site-level effects and intervention-type effects. And since this is ordinal, I need to estimate cut-points - I actually have specified site-specific cut-points rather than using a site-specific random effect, because we don’t want to force a proportional odds assumption across sites, only within a site. I think that should be enough information to understand the model.

My question: I am getting “divergence” messages and I want to know if there is a better way for me to parameterize this model to smooth things out a bit. The posterior plots look pretty reasonable, but the trace plots for the treatment effects have some clear spikes. Any suggestions would be much appreciated.

Here is the stan code:

``````data {
int<lower=0> N;                // number of observations
int<lower=2> L;                // number of ordinal categories
int<lower=1> K;                // number of sites
int<lower=1,upper=L> y[N];     // vector of categorical outcomes
int<lower=1,upper=K> kk[N];    // site for individual
int<lower=0,upper=1> ctrl[N];  // treatment or control
int<lower=1,upper=3> cc[K];    // specific control for site
}

parameters {
ordered[L-1] tau[K];      // cut-points for cumulative odds model

vector[K] delta_k;        // site specific treatment effect
real<lower=0>  eta_0;     // sd of delta_k (around delta)

vector delta;          // control-specific effect
real<lower=0> eta;        // sd of delta (around Delta)

real Delta;               // overall control effect
}

transformed parameters{

vector[N] yhat;

for (i in 1:N)
yhat[i] = ctrl[i] * delta_k[kk[i]];
}

model {

// priors

eta_0 ~ cauchy(0, 25);
eta ~ cauchy(0, 25);

for (k in 1:K)
delta_k[k] ~ normal(delta[cc[k]], eta_0);

for (c in 1:3)
delta[c] ~ normal(Delta, eta);

Delta ~ normal(0, 1);

// I also used an induced_dirichlet approach here with
// very similar results:

for (k in 1:K)
for (l in 1:(L-1))
tau[k, l] ~ uniform(-5, 5);

// outcome model

for (i in 1:N)
y[i] ~ ordered_logistic(yhat[i], tau[kk[i]]);
}
``````

If it would help for me to submit the simulation code, I’d be happy to do that - it is a little bit involved, but not too bad.

Here are the trace plots for the treatment effects: And the trace plots for the variance parameters: And finally - the density plots for the cut-points (tau in the model) - one panel for each site, and there are 5 levels in the ordinal outcome, hence 4 cut-points: 1 Like

Sorry, your question fell through a bit (despite being well written and relevant).

First - if you are using R, I would strongly recommend you give `brms` a try - it should support the model you are trying to develop out of the box and has already implemented a ton of useful stuff (predictions, integration with loo, …).

If on the other hand your goal is to learn Stan (or you use a different host language), then I would suggest you to check out the Visualisation vignette - which shows how to generate some interesting plots to diagnose your divergences (and you should be able to reproduce those plots even in other languages with some work).

My main suspicions would be

• the centered parametrization for the varying effects (the bayesplot vignette discusses this or you can search “non-centered parametrization” here)
• very wide priors for `eta_0` and `eta` - do you really believe your effects can be that big? Remember the effects are basically on something like a log-odds scale
• the `uniform` prior on `tau` - since `tau` is defined to span the whole real numbers, this will lead to values “out of bounds” of the `uniform` prior. You may safely have something like brms does (`tau[k] ~ student_t(3, 0, 2.5);`. There is no `ordered` AND constrained parameter by default in Stan, but if you insist on having it, it can be built from `simplex` - taking the cumulative sum of a simplex parameter gives you ordered values in the [0,1] interval which can be rescaled. But I would not recommend this.

Best of luck with your model!