How can I convert my model to use `ordered_logistic_glm` from `ordered_logistic`?

I noticed that this OpenCL GPU routine for ordered_logistic (ordered_logistic_glm_lpmf) was added to stan-dev somewhat recently (https://github.com/stan-dev/math/pull/1252) and seems to promise some large performance gains.

Can this somehow be used as a drop in replacement for ordered_logistic? Our current model uses the ordered_logistic function and looks something like this:

  for(i in 1:N){
    X = ...;
    beta = ...;
    y[i] ~ ordered_logistic(X*beta, cuts);
  }

I tried translating this model to use the _glm function like so, I basically put all the data into vectors and used those as inputs:

transformed parameters {
  ...
 
  matrix[N,1] X;
  vector[N] beta;
  
  for(i in 1:N){
    X[i,1] = ...;
    beta[i] = ...;
  }
}

model {
  target += ordered_logistic_glm_lpmf(y | X, beta, cuts);
}

However the inputs fail the bounds constraints

Exception: ordered_logistic_glm_lpmf: Weight vector has dimension = 3087, expecting dimension = 1; a function was called with arguments of different scalar, array, vector, or matrix types, and they were not consistently sized;  all arguments must be scalars or multidimensional values of the same shape.

I guess it expects X to be of size N x k, where k is the length of the weight vector, but I don’t fully understand why. Why do I need N x k weights to fit this model when I only needed k weights to fit ordered_logistic?

Should I maybe use an X where each row is my weight vector, repeated N times?

Thanks!

Jeff

Hi Jeff,

these GLMs are part of cmdstan 2.22. The following signatures are supported currently

real ordered_logistic_glm_lpmf(int, row_vector, vector, vector)
real ordered_logistic_glm_lpmf(int, matrix, vector, vector)
real ordered_logistic_glm_lpmf(int, row_vector, vector, vector)
real ordered_logistic_glm_lpmf(int, matrix, vector, vector)

This does have to be added to the Stan docs, together with the also recently added categorical_logit_glm_lpmf

As for the specific error I am going to tag @tadej who implemented this in Stan Math.

Thanks! Yes I am using cmdstan 2.22 as you say.

@jeff-predictwise Your first code snippet suggests you are trying to predict a vector y of length N. To do that using ordered_logistic you need a vector of weights beta which remains same over iterations. In each iteration you need a vector of observations X and of same length k as beta in each of N iterations.

If you want to do the same with ordered_logistic_glm_lpmf you need the same vector beta and place all X-es into a matrix that will have dimmensions N x k. Also if X does not depend on any weights it would be more efficient to construct it in transformed_data block.

Your second snippet shows you want to to have one dimmension of X equal to 1. If you have a single waight than beta should also have length 1. If you have y of length 1 or you want to reuse same values of X for every y you just have to transpose X into matrix[1, N] X;.

EDIT: actually last two options probably won’t work with a matrix. You need row_vector[N] X;

Hope that helps. If not you will need to be more precise about what exactely do you want to do.

1 Like

Thanks @tadej,

To clarify, in the original model, each y was predicted by a single value of X and single value of beta. Sorry I think I didn’t make that clear.

Your suggestion to use row_vector instead of a matrix seems to work (pass the bounds constraint at least). Let me try running with that and see if the results of this vs the non-GLM model are the same.

Thanks,

Jeff

If that is the case I think you need to use matrix X[N,1] and vector[1] beta.

1 Like

If I read ordered_logistic_glm_lpmf correctly, it’s input form

ordered_logistic_glm_lpmf(y, x, beta, cuts)

does not allow for non-linear relationships between x and beta. What I am looking for is a “random effects” model, i.e. in pseudo-code

for (i in 1:nrow()){
eta = coefficient1[x1[i]]+coefficient2[x2[i]]
y[i] ~ ordered_logistic(eta, cutpoints)
}

Based on reading the documentation, question is whether this type of random effects model can be implemented in ordered_logistic_glm_lpmf at all. Thank you!

You could do that, but it’d have to roll eta into mu and set X to be zero-dimensional and the predictor coefficient to be zero dimemsional.

it doesn’t look like we’ve vectorized the ordered logistic, or you’d just be able to write this as:

y ~ ordered_logistic(coeff[x1] + coeff[x2], cutpoints);

Also, you don’t need to define intermediates. This can just be

for (n in 1:size(y))
  y[n] ~ ordered_logisic(coeff[x1[n]] + coeff[x2[n]], cutpoints);

The ordered logistic has been vectorised:

Available argument signatures for ordered_logistic:

  int ~ ordered_logistic(real, vector)
  int[ ] ~ ordered_logistic(vector, vector)
  int[ ] ~ ordered_logistic(vector, vector[ ])

Just need to be aware that the current versions of RStan and Pystan (2.19.x) have a bug in the vectorised code that causes very poor performance (more information and fix for RStan here). The bug has also been fixed in newer versions (2.20 and up), so cmdstanR and cmdstanPy could be used instead.

Bob, that was my intuition as well. I rewrote the model such that

Blockquote
target += ordered_logistic_glm_lpmf( %response_name% | eta_%response_name%, weight, c_%response_name%);

where eta_%response_name% is a matrix of size (N,1) derived by the same random effects over a number of responses, and a constant weight vector “weight” of 1. The speed-up of the model is remarkable, but we are still investigating whether this is due to the use of ordered_logistic_glm_lpmf per se (according to our logs, the GPU, while working, does very little work). Will keep you posted - also in our separate email chain with Andrew and David. Thx

Ok, while this works, we loose the major speed-up that ordered_logistic_glm_lpmf offers. The way I understand https://arxiv.org/pdf/1907.01063.pdf, the speed-up comes in at the matrix multiplication step X%*%beta, and as mentioned we don’t have a linear relation in our model. We can “cheat”

but as mentioned, in this case the GPU does no work, and our speedup is non-existent. Any thoughts on whether we can make the GPU-based speed-up work in a scenario where we don’t have natural matrix multiplication? My mind is going toward making the data matrix binary (but expanding the columns of the data) and likewise expanding the coefficient matrix. But, this would make harder establishing correlations for random-effects (e.g. corr across all education random effects etc.). Is it worth going down that path?