# Posterior_linpred with ordinal models

#1

Operating System: OSX 10.11.6
Interface Version: latest
Compiler/Toolkit: I don’t know, installed Stan with Xcode

Hello,

I am trying to understand how `posterior_linpred` works with ordinal models.

I am mostly familiar with `brms` and understand the output of the method `fitted`, which is the equivalent function of `posterior_linpred` for `brms`.

For example,

``````library(brms)

library(rethinking) # for data
data(Trolley)
d <- Trolley

rstan_options (auto_write=TRUE)
options (mc.cores=parallel::detectCores ()) # Run on multiple cores

# takes a few minutes to fit
mod.male.contact <- brm(response ~ male*contact, data=d,
family=cumulative("logit"), threshold="flexible",
prior=set_prior("normal(0,10)", class='b'),
warmup = 1000, iter = 2000, chains = 2)

newdata_to_predict <- data.frame(
male = c(0,0,1,1),
contact = c(0,1,0,1)
)

# equivalent to posterior_linpred in brms
fitted(mod.male.contact, newdata_to_predict, scale='response')

# Sample output:
# Gives probabilities for each ordinal response.
#
# , , P(Y = 1)
#
#     Estimate   Est.Error    2.5%ile   97.5%ile
# 1 0.14739169 0.004612668 0.13865579 0.15633546
# 2 0.21814262 0.010689809 0.19683522 0.23901573
# 3 0.08718669 0.003023406 0.08156363 0.09314114
# 4 0.16005079 0.008045255 0.14549309 0.17686909
#
# , , P(Y = 2)
#
#     Estimate   Est.Error    2.5%ile   97.5%ile
# 1 0.10261588 0.003504869 0.09565658 0.10934455
# 2 0.13158680 0.005212364 0.12163682 0.14195957
# 3 0.06834966 0.002600132 0.06340807 0.07339157
# 4 0.10860616 0.004753855 0.09939101 0.11784792
# ... continues P(Y=y) for y = 1, 2, ... , 7
``````

Here, I understand the output. For each set of predictor values, I get the mean probability and percentiles for each possible response 1-7.

However in `rstanarm`, I am uncertain how I should interpret the output of `posterior_linpred`:

``````library(rstanarm)

d\$response <- as.factor(d\$response)

# not sure if right syntax or model. Still new to rstanarm
mod.arm <- stan_polr(response ~ male*contact, data=d,
prior = R2(0.25), prior_counts = dirichlet(1))

posterior_linpred(mod.arm, transform=TRUE, newdata = newdata_to_predict)
# Sample output:
#
# iterations   1         2         3         4
#       [1,] 0.5 0.3892393 0.6464041 0.4612909
#       [2,] 0.5 0.3603153 0.6361793 0.4886859
#       [3,] 0.5 0.3733287 0.6429214 0.4761588
#       [4,] 0.5 0.3792128 0.6512925 0.4619251
#       [5,] 0.5 0.3711230 0.6322236 0.4860978
#       [6,] 0.5 0.3931835 0.6462488 0.4654676
# ..... more rows
``````

I see a column for each set of predictors, but I’m not certain how to interpret the probabilities. With an ordinal model I should have a unique set of probabilities for each response. How do I recover that from the output of `posterior_linpred`?

So I see two potential problems.

1. I’m not fitting the model right. The `rstanarm` model is something slightly different than the `brms` model. Although they both have identical summaries.

2. I’m not using the right function. Maybe I need something different than `posterior_linpred`?

Thanks for help! The code should be reproducible. Let me know if there are any troubles.

#2

In `posterior_linpred` with `transform = TRUE` in rstanarm, you are getting `x[i] * beta` transformed by the standard logistic CDF, which is probably not what you want. You should be using `posterior_predict` which will produce a simulations x observations matrix, filled with realizations among 1, 2, …, 7 (as characters). For any observation or for all the observations, you can then calculate the proportion of realizations that are a particular value.

#3

Thanks for the response.

So using `posterior_predict` I get simulated responses for each set of predictors:

``````posterior_predict(mod.arm, newdata = newdata_to_predict)

#      1   2   3   4
# [1,] "1" "6" "4" "6"
# [2,] "3" "1" "2" "4"
# [3,] "3" "2" "7" "3"
# [4,] "3" "1" "7" "7"
# [5,] "4" "1" "7" "1"
# [6,] "3" "3" "5" "1"
``````

I understand I could calculate the proportions to estimate the probability of a particular response. But how do I get the uncertainty of those probabilities? Do I need to call `posterior_predict` again to get a new predicted outcomes to generate that uncertainty?

#4

Do the calculations row-wise. For example, `summary(rowMeans(PPD == "1"))`

#5

Thanks for the idea.

So testing this in `R` we have:

``````PPD <- posterior_predict(mod.arm, newdata = newdata_to_predict)

mean(rowMeans(PPD=="1"))
# [1] 0.15575

quantile(rowMeans(PPD=="1"), probs = c(0.025, 0.975))
#  2.5% 97.5%
#   0.0   0.5
``````

The mean isn’t far off from the the `brms` `fitted` estimate, but the percentiles are no good. I bet this could be solved by taking more samples from `posterior_predict`. Thoughts?

A second idea is to use `posterior_linpred` with `transform = FALSE`. If the output is the cumulative log odds (the link function from the ordinal model) I could calculate those probabilities myself. However, I think I am probably expecting something different from the actual output like I was when `transform = TRUE`.

Based on the output `posterior_linpred` again seems to be providing something different than I expected. I don’t know I would recover the cumulative log odds for each intercept/response value.

``````posterior_linpred(mod.arm, transform=FALSE, newdata = newdata_to_predict)

# iterations 1          2         3           4
#       [1,] 0 -0.4505110 0.6032702 -0.15514698
#       [2,] 0 -0.5739961 0.5588192 -0.04526401
#       [3,] 0 -0.5179631 0.5880665 -0.09543704
#       [4,] 0 -0.4928907 0.6247252 -0.15259503
#       [5,] 0 -0.5274020 0.5417678 -0.05562322
#       [6,] 0 -0.4339500 0.6025905 -0.13834990``````

#6

As explained in the documentation, `posterior_linpred` is usually not what you want to use
http://mc-stan.org/rstanarm/reference/posterior_linpred.stanreg.html
although arguably it does what you should expect it to do given its name. It is true that `posterior_predict` yields noisy estimates of proportions, particularly if you are estimating 95% credible intervals (which are generally too noisy to be useful).

Perhaps the best use case for `posterior_linpred` with `transform = TRUE` is to obtain less noisy success probabilities when the outcome is binary. When the outcome is ordinal, you could do something similar but more involved to get outcome probabilities using realizations of the cutpoints. To do all that, you should start with `as.data.frame(mod.arm)` or `as.matrix(mod.arm)` to access all the draws.

The main reasons why rstanarm does not automate this are: (1) Those probabilities are not all that useful (2) Too many rstanarm users will mess up their inferences by evaluating some function of them, rather than evaluating their function over all posterior draws.

#7

Thanks for the clarification.

Just to understand where I am coming from for future readers:

I am currently working through Statistical Rethinking by McElreath and in the book he interprets the ordinal model by response probabilities (section 11.1.3, page 342 specifically).

However, skimming through chapter 6 of BDA3, Gelman et al. recommend posterior predictive checks for model checking, even for outcomes of logistic regression (page 154) which I typically think of in terms of probability. Page 532 also has a nice graphical posterior check for a mixture model.

On the other hand, page 417 plots probabilities found from a logistic regression against a predictor. Overall, the closest model I could find was was in 16.6, where the check is a table of log-odds on page 425.

I have to admit it is surprising to me that probabilities aren’t useful. It seems like a natural question one would like answered by the model. I’ll definitely research and think on it.

Thanks again,

Tim