Prediction with ordered logistic regression

My model is:

data {
  int<lower=0> N;
  int<lower=0> D;
  int<lower=0> K;
  row_vector[D] x[N];
  int<lower=1, upper=K> y[N];
}

parameters {
  vector[D] beta;
  ordered[K-1] c;
}

model {
  for (n in 1:N)
    y[n] ~ ordered_logistic(x[n] * beta, c);
}

generated quantities {
  int<lower=1, upper=K> y_pred[N];
  for (n in 1:N)
    y_pred[n] = ordered_logistic_rng(x[n] * beta, c);
}

and I’m extracting estimations with this

y_pred = fit.extract()['y_pred'].mean(axis=0).

y_pred results like:

array([1.53   , 1.9155 , 2.065  , 1.466  , 1.40775, 1.40475, 1.41725,...])

But y_pred’s type is int. So I’m expecting it must be result in integers(I have three levels, 1,2,3), not doubles. What am I missing?

If you “average” over the integer labels for the categories, you get real numbers on the (1, 3) interval. But sample averages of integer labels are not an estimator of anything.

1 Like

If you “average” over the integer labels for the categories
How can I do that? Can you provide an example?

It is your example: fit.extract()['y_pred'].mean(axis=0).

1 Like

But it is my problem, it doesn’t return integer classes. How can I interpret the floats when outputs are classes(1,2,3)? I need to predict classes.

If you don’t take the mean, then you have your categorical predictions.

So, for each y[n], I have more than one predictions. How would I predict from them? Take a random sample?

Having thousands of realizations of the posterior predictive distribution is a good thing, which allows you to quantify your uncertainty. For example, for each observation being predicted, you could calculate the proportion of the predictions that are 1, 2, …, K. Alternatively, you could calculate the median prediction for each observation. Etc.

2 Likes

Thank you.