Interpreting ordered_logistic results

I am exploring the ordered_logistic model with a very simple model (to see how it works). I expected the cutpoint means to convert to probabilities that would exactly match the data frequencies. But the model returns frequencies biased toward the center. Here is the code.

options(mc.cores = parallel::detectCores())

# Round to 2 decimal digits
rd = function(x) {round(x,2)}

# Build data Only 4 categories
N=39  # 39 rubric scores

# Make data frequencies: 3,13,15,8 for scores 1,2,3,4
counts = c(0,0,0,0)
counts[1] = 3
counts[2] = 13
counts[3] = 15
counts[4] = 8

n = sum(counts)
prs=c(counts[1]/n,counts[2]/n,counts[3]/n,counts[4]/n) # decimal proportions

print(paste("n: ",n))
print(paste(c("Data Frequencies: ",rd(prs)),collapse = " "))

cum_pr_k=cumsum(prs) # cumulative probabilites
print(paste(c("Cumulative Frequencies: ",rd(cum_pr_k)),collapse = " "))

# set phi to 0. It is a placeholder for later corelations
phi = 0
stancode = "
    data {
      int<lower=0> n; // number of measurements
      int<lower=1,upper = 4> rubric_score[n]; // measurements
      real phi; // placeholder
    parameters {
      ordered[3] cutpoints;
    model {
      for (i in 1:n)
data_list = list(n=n,rubric_score=rubric_score,phi=phi)
mrubricord = stan("mrubric.stan",data = data_list,chains=4,cores=4,iter = 2000)


# Name the logistic function its name

logistic <- plogis

# Extract means of the cutpoints
sumdat = mrubricord_summary$summary[,"mean"]
# Eliminate the mean of the lp__
sumdat = head(sumdat,-1)

# Convert to probabilities
for(i in 1:4) p[i]=cum_p[i+1]-cum_p[i]
print(paste(c("Data Frequencies: ",rd(prs)),collapse = " "))
print(paste(c("Stan Frequencies: ",rd(p)),collapse = " "))

there are a few things that might be going on:

  1. The prior on the cutpoints will push the cutpoints slightly towards zero, slightly preferring the central categories
  2. Mean is often not very representative of the posterior - especially for parameters like cutpoints. What you usually want to do is to compute your quantity of interest (e.g. the probability for each category) separately for each sample and summarise as the last step.

So for your case when I do this:

samps <- as.matrix(mrubricord, pars = "cutpoints")
cum_p=cbind(matrix(0, nrow = nrow(samps), ncol = 1),cum_pt,matrix(1, nrow = nrow(samps), ncol = 1))
for(i in 1:4) p[i]=mean(cum_p[,i+1]-cum_p[,i])
print(paste(c("Data Frequencies     : ",rd(prs)),collapse = " "))
print(paste(c("Stan Frequencies mean: ",rd(p)),collapse = " "))
for(i in 1:4) p[i]=quantile(cum_p[,i+1]-cum_p[,i],0.025)
print(paste(c("Stan Frequencies q025: ",rd(p)),collapse = " "))
for(i in 1:4) p[i]=quantile(cum_p[,i+1]-cum_p[,i],0.975)
print(paste(c("Stan Frequencies q975: ",rd(p)),collapse = " "))

I see that the means match pretty well, although there is substantial uncerainty:

[1] "Data Frequencies     :  0.08 0.33 0.38 0.21"
[1] "Stan Frequencies mean:  0.08 0.33 0.39 0.2"
[1] "Stan Frequencies q025:  0.02 0.2 0.24 0.09"
[1] "Stan Frequencies q975:  0.18 0.49 0.54 0.35"

Does that make sense?

1 Like

Hi Martin, I really appreciate your taking the time to comment on my question.

Concerning your first point - I suspected as much. I think this shows that one can’t be so casual about so-called noninformative priors. Once I get my entire experimental system built so it saves results to a csv file and makes a summary graph, I want to try setting the priors in terms of frequencies and then transforming them to cutpoints and see it that improves things.

On your second suggestion - Neato. Given that the cutpoints are still biased toward the middle, transforming them to frequencies and then averaging them must improve things because the nonlinearities somehow get rounded away.

Good. I have lots of ideas I can experiment with.

If I ever write up anything for my blog about this, can I mention your name in appreciation?

Thanks again. Jim Hatton

1 Like

I would however repeat that I would expect this to be almost negligible - you’ll notice that differences in means after properly summarising are muuuuuuuuch smaller than the uncertainty in posterior.

I think that’s not the right takeaway from this - what I think is important here is that taking mean (or any other summary function) of your samples and then applying transforms will give you different results than transforming your samples and then taking the mean. And summarising as the last step is almost always the correct answer and transforming summaries would usually give you a meaningless quantity. If you want to understand why, Mike Betancourt has a deep dive at Rumble in the Ensemble

Yes, I’d be glad :-)

1 Like