Different Intercept Terms in Frequentist and Bayesian Regression

I am getting some different results using Bayesian and Frequentist versions of the same analysis: a simple two-group repeated measures design, testing for between-group difference in change in scores from measurement 1 to measurement 2, which we could consider a measurement before and after an intervention of some kind. I am doing an ANCOVA-type analysis - a sort of de-facto repeated measures analysis - testing for group differences in post-intervention score controlling for pre-intervention score by including it as a covariate. It’s a pretty common model in Clinical Trials analysis and in Psychology.

In the toy data below there are two groups of forty participants. One group receives the standard intervention (standard group) and another receives the new intervention (new group). Each participant is measured twice, once before and once after an intervention. The average pre-intervention scores for both groups is 40. The post-intervention scores are 34 for the standard group (i.e. participants’ scores drop by 6 on average), and are 28 for the new group (i.e. participants’ scores drop by 12 on average), which means the new group’s scores dropped by 6 points more than the standard group’s on average.

set.seed(54321)
df <- data.frame(group = factor(rep(c("standard", "new"), each = 40), levels = c("standard", "new")),
                 pre = rnorm(80, 40, 4),
                 post = c(rnorm(40, 34, 4), rnorm(40, 28, 4))) 

Here’s the simple bar graph of group x time means (code not included)

Rplot5

I ran a model testing for the group difference in change in score from pre-intervention to post-intervention. post is the outcome, group the primary predictor and standardised pre-intervention score (preS see below) the covariate. My model has an intercept term a, a term for the grouping variable bGroup, and a term for the standardised pre-intervention term bPreS.

# put data in a list
dList <- list(N = nrow(df),
              group = as.integer(df$group),
              preS = (df$pre - mean(df$pre))/ sd(df$pre), # standardise pre-intervention score
              post = df$post,
              gMean = mean(c(df$pre, df$post)),
              gSD = sd(c(df$pre, df$post)))

# model
write("
      data{
      int<lower=1> N;
      real gMean;
      real gSD;
      int<lower=1,upper=2> group[N];
      real preS[N];
      real<lower=0> post[N];
      }
      
      parameters{
      real a;
      real bGroup;
      real bPreS;
      real<lower=0> sigma;
      }
      
      model{
      vector [N] mu;
      sigma ~ normal(0,10);
      a ~ normal(gMean, 5*gSD);
      bGroup ~ normal(0, 5*gSD);
      bPreS ~ normal(0, 5*gSD);
      for (i in 1:N) {
      mu[i] = a + bGroup*group[i] + bPreS*preS[i];
      }
      post ~ normal(mu, sigma);
      }
      ", file = "temp.stan")

# generate mcmc chains
library(rstan)
modS <- stan(file = "temp.stan",
            data = dList,
            warmup = 1e3,
            iter = 2e3,
            chains = 1)

print(modS, probs = c(0.025, 0.975))

###output
#           mean se_mean   sd    2.5%   97.5% n_eff Rhat
# a        39.93    0.06 1.30   37.42   42.46   417 1.01
# bGroup   -6.12    0.04 0.84   -7.79   -4.52   419 1.01
# bPreS     0.06    0.02 0.42   -0.74    0.95   777 1.00
# sigma     3.89    0.01 0.32    3.33    4.55   753 1.00
# lp__   -146.83    0.07 1.37 -150.12 -145.19   438 1.00

The MLE estimate of the intercept term a is 39.93 and the estimate of the group difference bGroup is -6.12.

Now, for comparison, run the same analysis using a frequentist approach,

lm(post ~ group + preS, df)

### output
# Coefficients:
# (Intercept)     groupnew         preS  
#    33.79590     -6.10455      0.06948  

The estimates of the group coefficient is the same in both methods: it is the between-group difference in pre-post difference, exactly what I programmed into the data.

However I must confess, being trained using frequentist methods, that the intercept coefficient in the lm() output makes more sense to me. It is the average post-intervention score in the standard group (for people with average pre-intervention score).

But what then is the intercept term in the Bayesian version?. What does a represent?

given that the variable group is described as follows in the data block

this variable will take values 1 or 2.

If you specify your linear model as you did, the variables group will take the values 0 for the standard group and 1 for the new group.

This should explain the differences you see. Try setting the values for “group” in the Stan model also to 0 and 1.

1 Like

Thank you @Guido_Biele that worked!. When I made the standard group 0 and the new group 1 and coded these as the new groups it retrieved the post-intervention average score in the standard group correctly

          mean se_mean   sd    2.5%   97.5% n_eff Rhat
a        33.70    0.03 0.71   32.32   34.99   621    1
bGroup   -4.17    0.04 0.96   -6.06   -2.25   596    1
bPreS    -0.04    0.02 0.49   -1.02    0.91   822    1
sigma     4.29    0.01 0.34    3.70    5.02   898    1
lp__   -154.39    0.08 1.58 -158.31 -152.52   397    1 

I realise now that I was taking the level-means coding technique, using indexing (which would specify something like bGroup[group[i]] for the group term) and just stupidly applying it unthinkingly to a reference-level coding model. Thank you so much for opening my eyes. A huge help!

Hi, if the data are based on two repeated measurements, you should include random intercepts for individual ID in the model. If you don’t, the estimated standard errors will be incorrect (as when you apply an unpaired t-test when you should be using a paired one). The frequentist code for this would be:

library(lme4)
lmer(post ~ group + preS + (1 | ID), df)

For the Stan version, see the examples on random effects in the manual

Hi @LucC. I have been taught that in these models the random intercepts for each individual are included implicitly in the model by the inclusion of their first score as a covariate. You get very similar estimates via a mixed-effects model

library(dplyr)
dfLong <- df %>% dplyr::select(id, group, pre, post) %>%
                 gather(key = time, value = score, pre, post) %>%
                 transform(time = factor(time, levels = c("pre", "post")))

summary(lmer(score ~ group*time + (1|id), data = dfLong))

### output
#                    Estimate Std. Error        df t value Pr(>|t|)    
# (Intercept)        39.91709    0.63939 155.95000  62.430  < 2e-16 ***
# groupnew            0.09049    0.90423 155.95000   0.100     0.92    
# timepost           -6.12193    0.89597  78.00000  -6.833 1.64e-09 ***
# groupnew:timepost  -6.19356    1.26709  78.00000  -4.888 5.34e-06 ***

It’s true that the standard errors are different in the non-mixed-effects model but there are different parameters.

summary(mod <- lm(post ~ group + preS, df))

### output
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept) 33.79590    0.60779  55.604  < 2e-16 ***
# groupnew    -6.10455    0.85957  -7.102 5.33e-10 ***
# preS         0.06948    0.43250   0.161    0.873   

How would I specify random effects in my Stan model? Via the inclusion of a subject term? Something like mu[i] = a + bGroup*group[i] + bPreS*preS[i] + bSubj[sIndex[i]]?

Interesting, I missed that detail in your original message. However, including the initial score as a predictor does imply the assumption that it is measured without error, which is not true (and you don’t assume so for the second score). Theoretically, this may lead to regression towards the mean, which is why I favor a random intercept model. Or just a linear regression of the pair-wise differences in scores. Then you don’t need to assume zero measurement error for the first score.

The approach you propose for adding a random intercept to you Stan model sounds correct. I simpler alternative is to use the package rstanarm (pre-compiled models) or brms (custom models that need to be compiled). Both packages allow the use of standard R formulae as with lm()

Wouldn’t the same be true of any covariate? Yet we include those in our models without needing to account for measurement error. The reason I chose this model over analysing pre-post difference scores is that it has a parameter for the group difference, which allows you to specify a prior for thaf difference.

Yes it is true for any covariate! And most people seem to have forgotten that. However, for some covariates the impact is negligible, e.g. age is a covariate that is typically stored in data with enough precision (eg integer number of years) to not cause problems. Another example is class membership of a student, which will be perfectly known for all students in a dataset (except for some misclassification errors here and there). A test score however is probably subject to higher error than say reported age or class membership. Bottom line: the level of regression to the mean increases with the amount of error (be it measurement or classification) in the covariate(s).

Yes. Here is how to do this in Stan: Stan User’s Guide

Here is a brief discussion of this issue, which also discusses a recent paper from Westphal and Yarkoni on the topic: In Bayesian regression, it’s easy to account for measurement error | Statistical Modeling, Causal Inference, and Social Science

Regarding how to use the first measurement, I agree with @LucC that is best to either do a random effects model, or use the difference between first and second measurement as outcome for the regression model.