I am curious about what the intercept means when you include it with an indexing parameter in a linear model. Here are some data to illustrate my example. There are three groups (variable group
: three levels standard
, new1
, and new2
), with average score
6, 12, and 24. I want to estimate the group average score using a sum-to-zero parameterization, where coefficients represent deflections from a grand mean.
Note that in the transformed parameters
block I have recalculated the group means by transforming the original parameters a0
(intercept) and a[1:3]
generated by the a0 + a[index]
model, and converted them into deflection parameters b[1:3]
, representing deflections from the grand mean b0
.
library(dplyr)
library(rstan)
set.seed(12345)
#create data
df <- data.frame(group = factor(rep(c("standard", "new1", "new2"), times = 40), levels = c("standard", "new1", "new2")),
score = rnorm(120, c(6,12,24), 0.5))
# put into list
dList <- list(N = nrow(df),
nLevels = nlevels(df$group),
gIndex = as.integer(df$group),
score = df$score,
gMean = mean(df$score),
gSD = sd(df$score))
# model
write("
data{
int<lower=1> N;
int<lower=1> nLevels;
real gMean;
real gSD;
int<lower=1,upper=nLevels> gIndex[N];
real score[N];
}
parameters{
real a0;
vector[nLevels] a;
real sigma;
}
transformed parameters{
real b0;
vector [nLevels] m;
vector [nLevels] b;
for (j in 1:nLevels) {
m[j] = a0 + a[j];
}
b0 = mean(m[1:nLevels]);
for (j in 1:nLevels) {
b[j] = m[j] - b0;
}
}
model{
a0 ~ normal(gMean, 5*gSD);
a ~ normal(0,5*gSD);
sigma ~ cauchy(0,2);
score ~ normal(a0 + a[gIndex], sigma);
}
", file = "temp.stan")
# generate chains
fit <- stan(file = "temp.stan",
data = dList,
warmup = 1e3,
iter = 2e3,
cores = 1,
chains = 1)
#put into data fame
chains <- data.frame(extract(fit, pars = c("a0", "a", "b0", "b")))
Now if we calculate level means using the a
's and the reparameterized b
's.
chains <- mutate(chains,
standard_out = a0 + a.1,
new1_out = a0 + a.2,
new2_out = a0 + a.3,
standard_in = b0 + b.1,
new1_in = b0 + b.2,
new2_in = b0 + b.3)
…and put all the coefficients into a table
bayesCoefs <- apply(chains, 2, mean)
output <- data.frame(aCoefs = bayesCoefs[1:4],
bCoefs = bayesCoefs[5:8],
levelMeans_a = c(NA, bayesCoefs[9:11]),
levelMeans_b = c(NA, bayesCoefs[12:14]))
row.names(output) <- c("intercept", "standard", "new1", "new2")
output
# output
# aCoefs bCoefs levelMeans_a levelMeans_b
# intercept 13.269875 14.109226 NA NA
# standard -7.171809 -8.011159 6.098066 6.098066
# new1 -1.101934 -1.941284 12.167942 12.167942
# new2 10.791795 9.952444 24.061670 24.061670
The output shows several things.
-
First, the group means calculated outside the model, straight from the
a0 + a[gIndex]
parameters, and the group means calculated from the sum-to-zero parametersb0
andb[1:3]
(these parameters calculated inside Stan within thetransformed parameters
block) are exactly the same. -
Second, the sum-to-zero procedure worked, with
b0
being exactly the same as the group mean (or very very close)
mean(df$score)
# [1] 14.1064
output$bCoefs[1]
# [1] 14.10923
…and with the b
deflection coefficients equalling zero (or very very close)
sum(output$bCoefs[2:4])
# [1] 6.661338e-16
…but not the a
deflection coefficients
sum(output$aCoefs[2:4])
[1] 2.518052
- Third, the
a
coefficients and theb
coefficients all differ by the same absolute amount
output$aCoefs - output$bCoefs
# [1] -0.8393508 0.8393508 0.8393508 0.8393508
What is going on here?
What is a0
and why is 0.8393508
important?
The a
coefficients are not useless because they allow us to retrieve the group means accurately (which we can then use to calculate contrasts etc) but I am very curious as to how the a0
intercept is calculated. In this example there is not a huge difference, but surely you need to know what the a0
intercept represents when it is accompanied by a vectorised dot product term in order to set a prior on that intercept?
Obviously you could parameterize the likelihood with level-means coding (i.e. score ~ normal(a[gIndex, sigma)
) or with reference-level coding (i.e. score ~ normal(a + bNew1 + bNew2, sigma)
) but there are some advantages to using the grand-mean intercept + deflections strategy, especially when models have several factors, with lots of levels and interactions. So I’m keen to work this out.