 # Ordinal model cumulative logit parameter estimates

I noticed a difference in the parameter estimates for a centred versus non-centred version of an ordered logistic model. This became apparent when I implemented a model per the stan users guide (1.8 Ordered logistic and probit regression | Stan User’s Guide and 12.6 Ordered Logistic Distribution | Stan Functions Reference) and then compared it to an equivalent brms version.

Start with `polr` and a simple dataset as a reference point.

``````library(data.table)
library(MASS)
library(foreign)

dat <- data.table(
)

# mean centre per brms approach to prop odds
dat[, pared2 := pared - mean(pared)]
dat[, gpa2 := gpa - mean(gpa)]
dat[, apply2 := as.numeric(apply)]

m1 <- polr(apply ~ pared + gpa, data = dat, Hess=TRUE)
m2 <- polr(apply ~ pared2 + gpa2, data = dat, Hess=TRUE)
``````

The two models have the same parameter estimates but different intercepts

``````# same ors, different intercepts
summary(m1)
Coefficients:
Value Std. Error t value
pared 1.0457     0.2656   3.937
gpa   0.6042     0.2539   2.379

Intercepts:
Value  Std. Error t value
unlikely|somewhat likely    2.1763 0.7671     2.8370
somewhat likely|very likely 4.2716 0.7922     5.3924

summary(m2)
Coefficients:
Value Std. Error t value
pared2 1.0457     0.2656   3.936
gpa2   0.6043     0.2539   2.379

Intercepts:
Value   Std. Error t value
unlikely|somewhat likely     0.1995  0.1030     1.9372
somewhat likely|very likely  2.2948  0.1719    13.3494

``````

but both models produce the same probabilities when you run `predict`.

``````cbind(
predict(
m1,
newdata = data.table(pared = 0, gpa = mean(dat\$gpa)),
type = "p"),
predict(
m2,
newdata = data.table(pared2 = min(dat\$pared2), gpa2 = 0),
type = "p")
)
[,1]       [,2]
unlikely        0.59005138 0.59005227
somewhat likely 0.33120156 0.33120059
very likely     0.07874706 0.07874714
``````

Initially, I implemented the model using stan. I have aligned the variable names and formatting to be as similar as possible to the `brms` model.

``````data {
int<lower=1> N;
int Y[N];
int<lower=2> nthres;
int<lower=1> K;
matrix[N, K] X;
int prior_only;
}
parameters {
vector[K] b;
ordered[nthres] Intercept;
}
model {
if (!prior_only) {
vector[N] mu = X * b;
for (n in 1:N) {
target += ordered_logistic_lpmf(Y[n] | mu[n], Intercept);
}
}
target += normal_lpdf(b | 0, 10);
target += student_t_lpdf(Intercept | 3, 0, 2.5);
}
generated quantities {}
``````

next, using `brms::make_stancode(apply2 ~ pared + gpa, data = dat, family=brms::cumulative("logit"))` I produced the following (I removed the custom functions as they are unnecessary here).

``````data {
int<lower=1> N;  // total number of observations
int Y[N];  // response variable
int<lower=2> nthres;  // number of thresholds
int<lower=1> K;  // number of population-level effects
matrix[N, K] X;  // population-level design matrix
int prior_only;  // should the likelihood be ignored?
}
transformed data {
int Kc = K;
matrix[N, Kc] Xc;  // centered version of X
vector[Kc] means_X;  // column means of X before centering
for (i in 1:K) {
means_X[i] = mean(X[, i]);
Xc[, i] = X[, i] - means_X[i];
}
}
parameters {
vector[Kc] b;  // population-level effects
ordered[nthres] Intercept;  // temporary thresholds for centered predictors
}
transformed parameters {
real<lower=0> disc = 1;  // discrimination parameters
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = Xc * b;
for (n in 1:N) {
target += ordered_logistic_lpmf(Y[n] | mu[n], Intercept);
}
}
// priors including constants
target += student_t_lpdf(Intercept | 3, 0, 2.5);
}
generated quantities {
// compute actual thresholds
vector[nthres] b_Intercept = Intercept + dot_product(means_X, b);
}
``````

The main difference between the two is that the latter centres the covariates.

``````ld <- brms::make_standata(apply2 ~ pared + gpa,
data = dat,
family=brms::cumulative("logit"))
f1 <- rstan::sampling(object  = mod1,
data    = ld,
chains  = 1,
iter    = 5000,
warmup  = 1000,
refresh = 1000)
f2 <- rstan::sampling(object  = mod2,
data    = ld,
chains  = 1,
iter    = 5000,
warmup  = 1000,
refresh = 1000)
``````

After sampling from the two models I get:

standard `rstan` version:

``````                mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
b            1.06    0.01 0.28    0.53    0.87    1.06    1.24    1.62  2724    1
b            0.47    0.01 0.24   -0.01    0.30    0.46    0.63    0.95  1315    1
Intercept    1.76    0.02 0.73    0.36    1.26    1.75    2.24    3.20  1353    1
Intercept    3.85    0.02 0.75    2.40    3.34    3.85    4.35    5.32  1339    1
``````

`brms` (centred covariate):

``````                  mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
b              1.05    0.00 0.27    0.53    0.87    1.05    1.23    1.58  3474    1
b              0.60    0.00 0.26    0.11    0.43    0.60    0.77    1.10  3914    1
b_Intercept    2.17    0.01 0.77    0.66    1.66    2.17    2.69    3.65  4010    1
b_Intercept    4.27    0.01 0.80    2.73    3.74    4.27    4.81    5.84  4031    1
``````

the `brms` (centred covariate) version appears to align with the `m1` output from `polr`, whereas the other version does not. However, as with `polr` when I predict the probability of being in each group for a set of covariates I get the same values using both models and the results are similar to the frequentist version.

``````post1b <- as.matrix(f1, pars = "b")
post2b <- as.matrix(f2, pars = "b")

post1c <- as.matrix(f1, pars = "Intercept")
post2c <- as.matrix(f2, pars = "b_Intercept")

q1 <- exp(
post1c - mean(dat\$gpa)*post1b[, 2])/
(1 + exp(post1c - mean(dat\$gpa)*post1b[, 2]))

q2 <- exp(
post2c - mean(dat\$gpa)*post2b[, 2])/
(1 + exp(post2c - mean(dat\$gpa)*post2b[, 2]))

p1 <- cbind(q1[,1],q1[,2]-q1[,1])
p1 <- cbind(p1,1-rowSums(p1))

p2 <- cbind(q2[,1],q2[,2]-q2[,1])
p2 <- cbind(p2,1-rowSums(p2))

cbind(
colMeans(p1),
colMeans(p2)
)
[,1]       [,2]
[1,] 0.58882504 0.59000145
[2,] 0.33111789 0.33097647
[3,] 0.08005707 0.07902209
``````

It seems like I would be unable to compare the inference on the basis of the parameter estimates across these two parmaterisations, even though the predictions from each model are the same. For example, in the first model, the parameter associated with the gpa is smaller than what is reported in the second model. Is this true or have I simply made a mistake in the implementation/interpretation?