Ordinal probit response model (also translation from jags)

I’ve been struggling with getting an ordered probit model working in stan for the past couple of weeks, and am not sure where I’m going wrong. I am trying to reproduce some of the problems in Kruschke’s Doing Bayesian Data Analysis, trying to keep it as simple as possible - so the 1 group ordinal model from Chapter 23 in the book, similar to the R-Jags code here: https://github.com/davidkretch/dbda/blob/master/DBDA2Eprograms/Jags-Yord-Xnom1grp-Mnormal.R. I’ve looked for the Stan implementations of these models and I can’t find those either.

I’m working in python, and can get a simple model working in Jags pretty easily but my Stan translation gives me a mess - it runs but the samples make no sense. I have searched for complete Stan examples of ordinal probit response models or regression and only find partial solutions which haven’t worked when I’ve tried. I’m including the full python code here on the smallest self-contained problem I could find to help debug these problems, and tried to make the programs as similar to each other. Not sure of the best way to debug problems like this.

Thanks for any advice!

Here’s the pyjags code:

data={'y': np.array([3, 3, 3, 3, 1, 3, 2, 2, 4, 2]),
 'N': 10,
 'K': 5,
     }
# the cutoff points
K=data['K']
c = np.arange(1.5, K)
c = np.ma.asarray(c)
c[1:-1] = np.ma.masked  # these are the missing ones
data['c']=c

model_code= """
model {
    for ( k in 2:(K-2) ) {  # 1 and K-1 are fixed, not stochastic
        c[k] ~ dnorm( k+0.5 , 0.25)
    }
    sigma ~ dunif( K/1000 , K*10 )
    mu ~ dnorm( (1+K)/2 , 1/(K)^2 )

    for ( i in 1:N ) {
        pr[i,1] <- pnorm( c[1] , mu , 1/sigma^2 )
        for ( k in 2:(K-1) ) {
            pr[i,k] <- max( 0 ,  pnorm( c[ k ] , mu , 1/sigma^2 )
                               - pnorm( c[k-1] , mu , 1/sigma^2 ) )
        }
        pr[i,K] <- 1 - pnorm( c[K-1] , mu , 1/sigma^2 )

        y[i] ~ dcat( pr[i,1:K] )

    }
    
}
"""

import pyjags
model = pyjags.Model(model_code, data=data, 
                     adapt=500,
                     chains=4)

samples = model.sample(500)

Here’s the pystan program:

import numpy as np

data={'y': np.array([3, 3, 3, 3, 1, 3, 2, 2, 4, 2]),
 'N': 10,
 'K': 5,
     }

# the cutoff points
K=data['K']

# not needed in stan - done through init function
c = np.arange(1.5, K)
c = np.ma.asarray(c)
c[1:-1] = np.ma.masked  # these are the missing ones

# print('c:\t\t{}'.format(c))

# data['c']=c
data

model_code= """
functions {
    real mymax(real x,real y) {
        if (x>y) {
            return x;
        } else {
            return y;
        }
    }
    
}

data {
    int<lower=2> K;
    int<lower=0> N;
    int<lower=1,upper=K> y[N];
}

parameters {
    ordered[K-1] c;
    real<lower=0> sigma;
    real mu;
}


model {

    vector[K] pr[N];
    
    for ( k in 2:(K-2) ) {  # 1 and K-1 are fixed, not stochastic
        c[k] ~ normal( k+0.5 , 0.25);
    }
    sigma ~ uniform( K/1000 , K*10 );
    mu ~ normal( (1+K)/2 , 1/(K)^2 );


    for (i in 1:N) {
        pr[i,1] = Phi( (c[1]-mu)/sigma );
        for ( k in 2:(K-1) ) {
            pr[i,k] = mymax( 0.0 ,  Phi( (c[k]-mu)/sigma ) - Phi( (c[k-1]-mu)/sigma ) );
        }
        pr[i,K] = 1 -  Phi( (c[K-1]-mu)/sigma );

        y[i] ~ categorical(pr[i]);
    } 
}

"""

def initf():
    return {
        'mu':mean(data['y']),
        'sigma':2,
        'c':arange(1.5,data['K']),
    }
initf()

import pystan
model = pystan.StanModel(model_code=model_code)
fit=model.sampling(data=data,iter=500,init=initf)
samples=fit.extract()

Pages 25 and 26 of the User Manual have the implementations of ordered logistic and probit models. The user guide is here: https://github.com/stan-dev/stan/releases/download/v2.18.0/users-guide-2.18.0.pdf

You can now also use ordered_probit() instead of calculating the probability of each class manually as in the probit example in the user guide.

Thanks for the reply! I did see that, although not the ordered_probit() part. I couldn’t get any full example to work even with this example - copy and pasted. I feel I am missing something because nothing is converging properly. For example, I tried this data:

data={'y': np.array([3, 3, 3, 3, 1, 3, 2, 2, 4, 2]),
 'N': 10,
 'K': 5,
 'D': 1,
 'x':array([1.119, 0.719, 0.667, 0.646, 0.232,  
            0.291, 0.428 , 0.634, 0.819, 0.932]).reshape(10,1),
}

with the model copied from page 26

data {
      int<lower=2> K;
      int<lower=0> N;
      int<lower=1> D;
      int<lower=1,upper=K> y[N];
      row_vector[D] x[N];
}
parameters {
  vector[D] beta;
  ordered[K-1] c;
}
model {
  vector[K] theta;
  for (n in 1:N) {
    real eta;
    eta = x[n] * beta;
    theta[1] = 1 - Phi(eta - c[1]);
    for (k in 2:(K-1))
      theta[k] = Phi(eta - c[k-1]) - Phi(eta - c[k]);
    theta[K] = Phi(eta - c[K-1]);
    y[n] ~ categorical(theta);
    } 
}

running it like

model = pystan.StanModel(model_code=model_code)
fit=model.sampling(data=data,iter=5000)

but something is wrong, because the samples of c[3] (the fourth element of the cutoff vector) have undefined values:

samples=fit.extract()
samples['c'][:,3]

[Out ] array([8.02592829e+307, 1.04136682e+308, 5.06407958e+307, ...,
       1.70444766e+307, 1.54014028e+308, 1.22586432e+308])

I also couldn’t get an example like this to work fixing the endpoints of c (1.5, K-0.5) as Krushcke does and how my jags code posted earlier does. Am I missing something here?

Perhaps I’m missing something in understanding your Python code, but your data appear to only have 4 response categories {1, …, 4}, but you’re inputting K = 5? That should be 4 I suspect.

I’ve been jumping between so many examples I didn’t catch that on the small example. seems to fix it there, at least for convergence. In Kruschke’s work, and in the jags code, two of the cutoff are fixed but the mean and standard deviation of the the underlying metric value is not. Can this be done in stan? (I have tried unsucessfully) I’ll check to see if otherwise my problems are solved at this point. Seems like ordinal_probit is in version 2.18 and not in 2.17, so I need to spend some time upgrading…thanks for the catch!

Yes, you can do that. In the parameters{} block you’d set the number of cut-points “c_raw” to K - 3, and then in the transformed parameters{} block, you’d initialize the (complete) number of cut-points “c” to K - 1, and hard code the values of, say, the first two, and the remaining ones to the values of c_raw.

Something like:

parameters {
  ordered<lower = 0.5>[K-3] c_raw; // lower = 0.5 so parameters are restricted to be at least as large as the arbitrarily hard-coded cut-points
}

transformed parameters {
  ordered[K-1] c;
  
  c[1] = -0.5; // Your hard-coded values for the first, say, 2 cut-points. Arbitrarily choosing a value of 0.5
  c[2] = 0.5;

  for(i in 3:K-1) {
    c[i] = c_raw[i-2];
  }

}

Bingo! That did it, and solved my biggest problem. Thanks! I’m a little surprised I didn’t think of this given the docs but sometimes the solution is just staring you in the face and you can’t see it.

Again, thanks a ton!

1 Like

Hi,
I’m trying to achieve pretty much the same thing, so I found this post very helpful. The difference is that I have an 11-point Likert scale. I was wondering about 2 things regarding the provided solution:

  1. Did defining a lower bound for c_raw work for you? Because STAN doesn’t seem to execute the code when I define bounds (either just a lower bound, or both lower and upper bounds) for an ordered vector.
  2. Kruschke defines fixed lower and upper thresholds in his example (1.5 as the first threshold, and k-0.5 as the last threshold). Is this realized in this solution somehow? Because it seems you fixed the two first thresholds (and specifically the first one at a negative value, which surprised me), but not the last one.
    Anyway, I tried to code the thresholds ordered vector with fixed lower and upper thresholds as follows:
    data{
    int <lower=1> nYlevels ; // number of responses on scale (0-10 scale, so 11 responses)

    }
    parameters {
    ordered<lower = 1.5, upper = nYlevels-0.5>[nYlevels-3] thresh_raw;
    }
    transformed parameters {
    ordered[nYlevels-1] thresh;
    thresh[1] = 1.5;
    for(i in 2:(nYlevels-2)) {
    thresh[i] = thresh_raw[i-1];
    }
    Thresh[nYlevels-1] = nYlevels-0.5;
    }

I keep getting error messages saying that either the second threshold is lower than the first (and fixed) one, or that the penultimate threshold is crossing the last (and fixed) threshold. What am I doing wrong?
Thanks,
Ofir

1 Like

I’m on the same boat. The implementation of the thresholds is causing sampling issues.

nhuurre offers a solution that works very well (check the link below). The solution is even better than the original DBDA2e parameterization IMHO, as it ensures that the thresholds are ordered.

1 Like

yep, that seems to work. thanks!

Hey everyone, it’s been a while but hoping people are still interested in this model.

I’ve taken a stab at coding it up in python, using the most basic packages for ease of use by the wider community (numpy, scipy, and jupyter). It’s in the repo at this address:

However, I still can’t quite get the posterior predictive probabilities for the movie data to match those in the paper, and haven’t coded up the hierarchical version. I’m running out of steam a little, and wondered whether anyone would be up for giving me a hand fixing and finalizing it (likely no more than 5 hours). I’ve left extensive comments, and am on hand to discuss.

Thanks in advance / point me to another forum if you know it’s being discussed there,

Ruairidh