Why does adding a varying intercept change a fixed slope?

Unlearned5341 · May 3, 2023, 12:06am

I feel this is a classical textbook example, but I can’t find it. Here is some fake data where some continuous variable y is measured over some geographical gradient (a continuous predictor x). Three species were measured for y at each of 5 sites (i.e., species and site are crossed factors). The data plots as following:

Let’s say we want to fit a linear model of y against x and let the intercept vary with species. I am trying to figure out how to account for the fact measurements come from sites. I have tried to add a varying intercept for each site but this alters the fixed slope so I am a bit confused and suspect my approach is wrong.

Find below what I have implemented so far.

Math notation with one varying intercept (m1):

\begin{aligned} y_{i} &\sim \text{Normal}(\mu_{i}, \sigma)\\ \mu_{i} &= \alpha_{1, \text{species}[i]} + \beta x_{i}\\ \alpha_{1,j} &\sim \text{Normal}(\overline{\alpha}, \sigma_{\alpha_{1}}) &\text{for $j$ in 1:3}\\ \overline{\alpha} &\sim \text{Normal}(0, 1)\\ \beta &\sim \text{Normal}(0, 1)\\ \sigma, \sigma_{\alpha_{1}} &\sim \text{Exponential}(1) \end{aligned}

Math notation with two varying intercepts (m2):

\begin{aligned} y_{i} &\sim \text{Normal}(\mu_{i}, \sigma)\\ \mu_{i} &= \alpha_{1, \text{species}[i]} + \alpha_{2,\text{site}[i]} + \beta x_{i}\\ \alpha_{1,j} &\sim \text{Normal}(\overline{\alpha}, \sigma_{\alpha_{1}}) &\text{for $j$ in 1:3}\\ \alpha_{2,k} &\sim \text{Normal}(0, \sigma_{\alpha_{2}}) &\text{for $k$ in 1:5}\\ \overline{\alpha} &\sim \text{Normal}(0, 1)\\ \beta &\sim \text{Normal}(0, 1)\\ \sigma, \sigma_{\alpha_{1}}, \sigma_{\alpha_{2}} &\sim \text{Exponential}(1) \end{aligned}

Stan model with one varying intercept (m1):

data {
  int<lower=0> N;
  int<lower=0> N_species;
  array[N] int<lower=1> species;
  vector[N] y;
  vector[N] x;
}

parameters {
  real alpha_bar;
  vector[N_species] alpha1;
  real beta;
  real<lower=0> sigma_alpha1;
  real<lower=0> sigma;
}

model {
  
  // hyper-priors
  target += std_normal_lpdf(alpha_bar); // same as normal_lpdf(alpha_bar | 0, 1)
  target += std_normal_lpdf(beta); // same as normal_lpdf(beta | 0, 1)
  target += exponential_lpdf(sigma_alpha1 | 1);
  target += exponential_lpdf(sigma | 1); 
  
  // adaptive priors
  target += normal_lpdf(alpha1 | alpha_bar, sigma_alpha1);
  
  //likelihood
  target += normal_lpdf(y | alpha1[species] + beta * x, sigma);
  
}

Stan model with two varying intercepts (m2):

data {
  int<lower=0> N;
  int<lower=0> N_sites;
  int<lower=0> N_species;
  array[N] int<lower=1> site;
  array[N] int<lower=1> species;
  vector[N] y;
  vector[N] x;
}

parameters {
  real alpha_bar;
  vector[N_species] alpha1;
  vector[N_sites] alpha2;
  real beta;
  real<lower=0> sigma_alpha1;
  real<lower=0> sigma_alpha2;
  real<lower=0> sigma;
}

model {
  
  // hyper-priors
  target += std_normal_lpdf(alpha_bar); // same as normal_lpdf(alpha_bar | 0, 1)
  target += std_normal_lpdf(beta); // same as normal_lpdf(beta | 0, 1)
  target += exponential_lpdf(sigma_alpha1 | 1);
  target += exponential_lpdf(sigma_alpha2 | 1);
  target += exponential_lpdf(sigma | 1); 
  
  // adaptive priors
  target += normal_lpdf(alpha1 | alpha_bar, sigma_alpha1);
  target += normal_lpdf(alpha2 | 0, sigma_alpha2);
  
  //likelihood
  target += normal_lpdf(y | alpha1[species] + alpha2[site] + beta * x, sigma);
  
}

Fake data:

site <- c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5)
species <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)
x <- c(-1.389, -1.344, -1.332, -1.298, -0.926, -0.870, -0.864, -0.824, -0.232, -0.125, -0.249, -0.187, 0.547, 0.513, 0.462, 0.598, 1.552, 1.405, 1.303, 1.405, -1.287, -1.248, -1.225, -1.248, -0.830, -0.774, -0.757, -0.819, 0.033, -0.096, -0.085, -0.017, 0.592, 0.812, 0.632, 0.705, 1.489, 1.371, 1.484, 1.636, -1.180, -1.169, -1.124, -1.163, -0.757, -0.796, -0.689, -0.712, -0.130, 0.016, 0.101, -0.046, 0.598, 0.643, 0.699, 0.812, 1.523, 1.546, 1.597, 1.715)
y <- c(0.905, 1.096, 0.887, 1.009, 0.470, 0.661, 0.383, 0.592, -0.260, -0.086, -0.052, -0.173, -0.973, -0.695, -0.938, -0.834, -1.999, -1.755, -1.686, -1.946, 1.165, 1.026, 1.200, 1.339, 0.783, 0.696, 0.887, 0.939, 0.053, 0.244, 0.400, 0.383, -0.451, -0.156, -0.434, -0.538, -1.634, -1.477, -1.408, -1.425, 1.322, 1.183, 1.391, 1.443, 0.974, 0.818, 0.835, 1.009, 0.122, 0.348, 0.626, 0.522, -0.625, -0.191, -0.347, -0.399, -1.251, -1.390, -1.286, -1.303)

Fit models with cmdstanr:

library(cmdstanr)

## data list for Stan
dlist <- list(
  N = length(y), 
  N_species = length(unique(species)), 
  N_sites = length(unique(site)), 
  species = species,
  site = site, 
  x = x, 
  y = y
)

## compile models
m1 <- cmdstan_model("mod1.stan")
m2 <- cmdstan_model("mod2.stan")

## fit models
fit1 <- m1$sample(data = dlist, chains = 4, parallel_chains = 4, refresh = 1000, seed = 123)
fit2 <- m2$sample(data = dlist, chains = 4, parallel_chains = 4, refresh = 1000, seed = 123)

Resulting slopes

> fit1$summary("beta")
# A tibble: 1 × 10
  variable   mean median     sd    mad    q5    q95  rhat ess_bulk ess_tail
  <chr>     <num>  <num>  <num>  <num> <num>  <num> <num>    <num>    <num>
1 beta     -0.967 -0.967 0.0226 0.0216 -1.00 -0.930  1.00    3891.    2356.
> fit2$summary("beta")
# A tibble: 1 × 10
  variable   mean median    sd   mad     q5   q95  rhat ess_bulk ess_tail
  <chr>     <num>  <num> <num> <num>  <num> <num> <num>    <num>    <num>
1 beta     0.0916  0.138 0.375 0.330 -0.706 0.608  1.00     625.     289.

jsocolar · May 3, 2023, 12:32am

This is Simpson’s paradox. Allowing the intercept to vary by site will certainly change the fixed slope. Assuming a sufficiently weak prior, the variable intercept will allow the model to fit a slope that captures the relationship between x and the response within sites. The between-site variation will be handled by the variable intercept for site. The single-intercept model will force the model to focus on capturing the relationship between x and the response across sites, but the model is severely misspecified and estimation of the across-sites slope will be overconfident due to failure to account for the site-level pseudoreplication.

Unlearned5341 · May 3, 2023, 3:48pm

Indeed! I wish I could remember names/technical terms… For those who would seek more information on this topic:

the Wikipedia page provides a pretty decent overview
it is illustrated by the grandparents education (Chapter 6) and Berkeley admission (Chapter 11) examples in McElreath Statistical Rethinking book

Cool stuff. It gets exciting when both contradictory correlations can make sense!

Topic		Replies	Views
Mark-Recapture model with a covariate fit with varying intercept and slopes greatly increases SD on betas Modeling ecology , capture-recapture	0	855	February 26, 2019
Multiple variable intercepts in a two-staged multilevel model Modeling techniques , fitting-issues , specification	7	476	October 25, 2023
Linear mixed effects model with varying intercept and slope in matrix notation Modeling	2	3356	January 17, 2019
Help with Varying Intercept, Varying Slope Model Modeling	2	955	April 12, 2020
Mistake in the specificaiton of a simple linear hierarchical model Modeling	1	440	August 8, 2018

Why does adding a varying intercept change a fixed slope?

Related topics