# Interpreting ordered_logistic results

I am exploring the ordered_logistic model with a very simple model (to see how it works). I expected the cutpoint means to convert to probabilities that would exactly match the data frequencies. But the model returns frequencies biased toward the center. Here is the code.

``````library("rstan")
options(mc.cores = parallel::detectCores())

# Round to 2 decimal digits
rd = function(x) {round(x,2)}

# Build data Only 4 categories
N=39  # 39 rubric scores

# Make data frequencies: 3,13,15,8 for scores 1,2,3,4
counts = c(0,0,0,0)
counts = 3
counts = 13
counts = 15
counts = 8

rubric_score=c(rep(1,counts),rep(2,counts),rep(3,counts),rep(4,counts))
n = sum(counts)
prs=c(counts/n,counts/n,counts/n,counts/n) # decimal proportions

print(paste("n: ",n))
print(paste(c("Data Frequencies: ",rd(prs)),collapse = " "))

cum_pr_k=cumsum(prs) # cumulative probabilites
print(paste(c("Cumulative Frequencies: ",rd(cum_pr_k)),collapse = " "))

# set phi to 0. It is a placeholder for later corelations
phi = 0
stancode = "
data {
int<lower=0> n; // number of measurements
int<lower=1,upper = 4> rubric_score[n]; // measurements
real phi; // placeholder
}
parameters {
ordered cutpoints;
}
model {
for (i in 1:n)

rubric_score[i]~ordered_logistic(phi,cutpoints);
cutpoints~normal(0,10);
}
"
writeLines(stancode,"mrubric.stan")
data_list = list(n=n,rubric_score=rubric_score,phi=phi)
mrubricord = stan("mrubric.stan",data = data_list,chains=4,cores=4,iter = 2000)

show(mrubricord)

# Name the logistic function its name

logistic <- plogis

# Extract means of the cutpoints
sumdat = mrubricord_summary\$summary[,"mean"]
# Eliminate the mean of the lp__

# Convert to probabilities
cum_pt=logistic(sumdat)
cum_p=c(0,cum_pt,1)
p=rep(NA,4)
for(i in 1:4) p[i]=cum_p[i+1]-cum_p[i]
print(paste(c("Data Frequencies: ",rd(prs)),collapse = " "))
print(paste(c("Stan Frequencies: ",rd(p)),collapse = " "))

``````

Hi,
there are a few things that might be going on:

1. The prior on the cutpoints will push the cutpoints slightly towards zero, slightly preferring the central categories
2. Mean is often not very representative of the posterior - especially for parameters like cutpoints. What you usually want to do is to compute your quantity of interest (e.g. the probability for each category) separately for each sample and summarise as the last step.

So for your case when I do this:

``````samps <- as.matrix(mrubricord, pars = "cutpoints")
cum_pt=logistic(samps)
cum_p=cbind(matrix(0, nrow = nrow(samps), ncol = 1),cum_pt,matrix(1, nrow = nrow(samps), ncol = 1))
for(i in 1:4) p[i]=mean(cum_p[,i+1]-cum_p[,i])
print(paste(c("Data Frequencies     : ",rd(prs)),collapse = " "))
print(paste(c("Stan Frequencies mean: ",rd(p)),collapse = " "))
for(i in 1:4) p[i]=quantile(cum_p[,i+1]-cum_p[,i],0.025)
print(paste(c("Stan Frequencies q025: ",rd(p)),collapse = " "))
for(i in 1:4) p[i]=quantile(cum_p[,i+1]-cum_p[,i],0.975)
print(paste(c("Stan Frequencies q975: ",rd(p)),collapse = " "))
``````

I see that the means match pretty well, although there is substantial uncerainty:

`````` "Data Frequencies     :  0.08 0.33 0.38 0.21"
 "Stan Frequencies mean:  0.08 0.33 0.39 0.2"
 "Stan Frequencies q025:  0.02 0.2 0.24 0.09"
 "Stan Frequencies q975:  0.18 0.49 0.54 0.35"
``````

Does that make sense?

1 Like

Hi Martin, I really appreciate your taking the time to comment on my question.

Concerning your first point - I suspected as much. I think this shows that one can’t be so casual about so-called noninformative priors. Once I get my entire experimental system built so it saves results to a csv file and makes a summary graph, I want to try setting the priors in terms of frequencies and then transforming them to cutpoints and see it that improves things.

On your second suggestion - Neato. Given that the cutpoints are still biased toward the middle, transforming them to frequencies and then averaging them must improve things because the nonlinearities somehow get rounded away.

Good. I have lots of ideas I can experiment with.