I am getting some different results using Bayesian and Frequentist versions of the same analysis: a simple two-group repeated measures design, testing for between-group difference in change in scores from measurement 1 to measurement 2, which we could consider a measurement before and after an intervention of some kind. I am doing an ANCOVA-type analysis - a sort of de-facto repeated measures analysis - testing for group differences in post-intervention score controlling for pre-intervention score by including it as a covariate. It’s a pretty common model in Clinical Trials analysis and in Psychology.
In the toy data below there are two groups of forty participants. One group receives the standard intervention (standard
group) and another receives the new intervention (new
group). Each participant is measured twice, once before and once after an intervention. The average pre-intervention scores for both groups is 40. The post-intervention scores are 34 for the standard
group (i.e. participants’ scores drop by 6 on average), and are 28 for the new
group (i.e. participants’ scores drop by 12 on average), which means the new
group’s scores dropped by 6 points more than the standard
group’s on average.
set.seed(54321)
df <- data.frame(group = factor(rep(c("standard", "new"), each = 40), levels = c("standard", "new")),
pre = rnorm(80, 40, 4),
post = c(rnorm(40, 34, 4), rnorm(40, 28, 4)))
Here’s the simple bar graph of group x time means (code not included)
I ran a model testing for the group difference in change in score from pre-intervention to post-intervention. post
is the outcome, group
the primary predictor and standardised pre-intervention score (preS
see below) the covariate. My model has an intercept term a
, a term for the grouping variable bGroup
, and a term for the standardised pre-intervention term bPreS
.
# put data in a list
dList <- list(N = nrow(df),
group = as.integer(df$group),
preS = (df$pre - mean(df$pre))/ sd(df$pre), # standardise pre-intervention score
post = df$post,
gMean = mean(c(df$pre, df$post)),
gSD = sd(c(df$pre, df$post)))
# model
write("
data{
int<lower=1> N;
real gMean;
real gSD;
int<lower=1,upper=2> group[N];
real preS[N];
real<lower=0> post[N];
}
parameters{
real a;
real bGroup;
real bPreS;
real<lower=0> sigma;
}
model{
vector [N] mu;
sigma ~ normal(0,10);
a ~ normal(gMean, 5*gSD);
bGroup ~ normal(0, 5*gSD);
bPreS ~ normal(0, 5*gSD);
for (i in 1:N) {
mu[i] = a + bGroup*group[i] + bPreS*preS[i];
}
post ~ normal(mu, sigma);
}
", file = "temp.stan")
# generate mcmc chains
library(rstan)
modS <- stan(file = "temp.stan",
data = dList,
warmup = 1e3,
iter = 2e3,
chains = 1)
print(modS, probs = c(0.025, 0.975))
###output
# mean se_mean sd 2.5% 97.5% n_eff Rhat
# a 39.93 0.06 1.30 37.42 42.46 417 1.01
# bGroup -6.12 0.04 0.84 -7.79 -4.52 419 1.01
# bPreS 0.06 0.02 0.42 -0.74 0.95 777 1.00
# sigma 3.89 0.01 0.32 3.33 4.55 753 1.00
# lp__ -146.83 0.07 1.37 -150.12 -145.19 438 1.00
The MLE estimate of the intercept term a
is 39.93 and the estimate of the group difference bGroup
is -6.12.
Now, for comparison, run the same analysis using a frequentist approach,
lm(post ~ group + preS, df)
### output
# Coefficients:
# (Intercept) groupnew preS
# 33.79590 -6.10455 0.06948
The estimates of the group coefficient is the same in both methods: it is the between-group difference in pre-post difference, exactly what I programmed into the data.
However I must confess, being trained using frequentist methods, that the intercept coefficient in the lm()
output makes more sense to me. It is the average post-intervention score in the standard
group (for people with average pre-intervention score).
But what then is the intercept term in the Bayesian version?. What does a
represent?