Ordinal probit response model (also translation from jags)


#1

I’ve been struggling with getting an ordered probit model working in stan for the past couple of weeks, and am not sure where I’m going wrong. I am trying to reproduce some of the problems in Kruschke’s Doing Bayesian Data Analysis, trying to keep it as simple as possible - so the 1 group ordinal model from Chapter 23 in the book, similar to the R-Jags code here: https://github.com/davidkretch/dbda/blob/master/DBDA2Eprograms/Jags-Yord-Xnom1grp-Mnormal.R. I’ve looked for the Stan implementations of these models and I can’t find those either.

I’m working in python, and can get a simple model working in Jags pretty easily but my Stan translation gives me a mess - it runs but the samples make no sense. I have searched for complete Stan examples of ordinal probit response models or regression and only find partial solutions which haven’t worked when I’ve tried. I’m including the full python code here on the smallest self-contained problem I could find to help debug these problems, and tried to make the programs as similar to each other. Not sure of the best way to debug problems like this.

Thanks for any advice!

Here’s the pyjags code:

data={'y': np.array([3, 3, 3, 3, 1, 3, 2, 2, 4, 2]),
 'N': 10,
 'K': 5,
     }
# the cutoff points
K=data['K']
c = np.arange(1.5, K)
c = np.ma.asarray(c)
c[1:-1] = np.ma.masked  # these are the missing ones
data['c']=c

model_code= """
model {
    for ( k in 2:(K-2) ) {  # 1 and K-1 are fixed, not stochastic
        c[k] ~ dnorm( k+0.5 , 0.25)
    }
    sigma ~ dunif( K/1000 , K*10 )
    mu ~ dnorm( (1+K)/2 , 1/(K)^2 )

    for ( i in 1:N ) {
        pr[i,1] <- pnorm( c[1] , mu , 1/sigma^2 )
        for ( k in 2:(K-1) ) {
            pr[i,k] <- max( 0 ,  pnorm( c[ k ] , mu , 1/sigma^2 )
                               - pnorm( c[k-1] , mu , 1/sigma^2 ) )
        }
        pr[i,K] <- 1 - pnorm( c[K-1] , mu , 1/sigma^2 )

        y[i] ~ dcat( pr[i,1:K] )

    }
    
}
"""

import pyjags
model = pyjags.Model(model_code, data=data, 
                     adapt=500,
                     chains=4)

samples = model.sample(500)

Here’s the pystan program:

import numpy as np

data={'y': np.array([3, 3, 3, 3, 1, 3, 2, 2, 4, 2]),
 'N': 10,
 'K': 5,
     }

# the cutoff points
K=data['K']

# not needed in stan - done through init function
c = np.arange(1.5, K)
c = np.ma.asarray(c)
c[1:-1] = np.ma.masked  # these are the missing ones

# print('c:\t\t{}'.format(c))

# data['c']=c
data

model_code= """
functions {
    real mymax(real x,real y) {
        if (x>y) {
            return x;
        } else {
            return y;
        }
    }
    
}

data {
    int<lower=2> K;
    int<lower=0> N;
    int<lower=1,upper=K> y[N];
}

parameters {
    ordered[K-1] c;
    real<lower=0> sigma;
    real mu;
}


model {

    vector[K] pr[N];
    
    for ( k in 2:(K-2) ) {  # 1 and K-1 are fixed, not stochastic
        c[k] ~ normal( k+0.5 , 0.25);
    }
    sigma ~ uniform( K/1000 , K*10 );
    mu ~ normal( (1+K)/2 , 1/(K)^2 );


    for (i in 1:N) {
        pr[i,1] = Phi( (c[1]-mu)/sigma );
        for ( k in 2:(K-1) ) {
            pr[i,k] = mymax( 0.0 ,  Phi( (c[k]-mu)/sigma ) - Phi( (c[k-1]-mu)/sigma ) );
        }
        pr[i,K] = 1 -  Phi( (c[K-1]-mu)/sigma );

        y[i] ~ categorical(pr[i]);
    } 
}

"""

def initf():
    return {
        'mu':mean(data['y']),
        'sigma':2,
        'c':arange(1.5,data['K']),
    }
initf()

import pystan
model = pystan.StanModel(model_code=model_code)
fit=model.sampling(data=data,iter=500,init=initf)
samples=fit.extract()

#2

Pages 25 and 26 of the User Manual have the implementations of ordered logistic and probit models. The user guide is here: https://github.com/stan-dev/stan/releases/download/v2.18.0/users-guide-2.18.0.pdf

You can now also use ordered_probit() instead of calculating the probability of each class manually as in the probit example in the user guide.


#3

Thanks for the reply! I did see that, although not the ordered_probit() part. I couldn’t get any full example to work even with this example - copy and pasted. I feel I am missing something because nothing is converging properly. For example, I tried this data:

data={'y': np.array([3, 3, 3, 3, 1, 3, 2, 2, 4, 2]),
 'N': 10,
 'K': 5,
 'D': 1,
 'x':array([1.119, 0.719, 0.667, 0.646, 0.232,  
            0.291, 0.428 , 0.634, 0.819, 0.932]).reshape(10,1),
}

with the model copied from page 26

data {
      int<lower=2> K;
      int<lower=0> N;
      int<lower=1> D;
      int<lower=1,upper=K> y[N];
      row_vector[D] x[N];
}
parameters {
  vector[D] beta;
  ordered[K-1] c;
}
model {
  vector[K] theta;
  for (n in 1:N) {
    real eta;
    eta = x[n] * beta;
    theta[1] = 1 - Phi(eta - c[1]);
    for (k in 2:(K-1))
      theta[k] = Phi(eta - c[k-1]) - Phi(eta - c[k]);
    theta[K] = Phi(eta - c[K-1]);
    y[n] ~ categorical(theta);
    } 
}

running it like

model = pystan.StanModel(model_code=model_code)
fit=model.sampling(data=data,iter=5000)

but something is wrong, because the samples of c[3] (the fourth element of the cutoff vector) have undefined values:

samples=fit.extract()
samples['c'][:,3]

[Out ] array([8.02592829e+307, 1.04136682e+308, 5.06407958e+307, ...,
       1.70444766e+307, 1.54014028e+308, 1.22586432e+308])

I also couldn’t get an example like this to work fixing the endpoints of c (1.5, K-0.5) as Krushcke does and how my jags code posted earlier does. Am I missing something here?


#4

Perhaps I’m missing something in understanding your Python code, but your data appear to only have 4 response categories {1, …, 4}, but you’re inputting K = 5? That should be 4 I suspect.


#5

I’ve been jumping between so many examples I didn’t catch that on the small example. seems to fix it there, at least for convergence. In Kruschke’s work, and in the jags code, two of the cutoff are fixed but the mean and standard deviation of the the underlying metric value is not. Can this be done in stan? (I have tried unsucessfully) I’ll check to see if otherwise my problems are solved at this point. Seems like ordinal_probit is in version 2.18 and not in 2.17, so I need to spend some time upgrading…thanks for the catch!


#6

Yes, you can do that. In the parameters{} block you’d set the number of cut-points “c_raw” to K - 3, and then in the transformed parameters{} block, you’d initialize the (complete) number of cut-points “c” to K - 1, and hard code the values of, say, the first two, and the remaining ones to the values of c_raw.

Something like:

parameters {
  ordered<lower = 0.5>[K-3] c_raw; // lower = 0.5 so parameters are restricted to be at least as large as the arbitrarily hard-coded cut-points
}

transformed parameters {
  ordered[K-1] c;
  
  c[1] = -0.5; // Your hard-coded values for the first, say, 2 cut-points. Arbitrarily choosing a value of 0.5
  c[2] = 0.5;

  for(i in 3:K-1) {
    c[i] = c_raw[i-2];
  }

}

#7

Bingo! That did it, and solved my biggest problem. Thanks! I’m a little surprised I didn’t think of this given the docs but sometimes the solution is just staring you in the face and you can’t see it.

Again, thanks a ton!