Hi everyone,
I’m just starting out with Stan, so please bear with me.
I have fit a linear regression model with a continuous response and just one binary predictor using Stan in R. I am essentially comparing the mean of two groups. The estimate for the regression coefficient of the binary predictor (i.e. the difference between the two groups) is the same as when using the lm() function in R, but the estimate for the intercept is different.
For info, the (sample) mean of group A is 1.46 and the (sample) mean of group B is 4.26. The difference between the two is 2.8, which is equal to the regression coefficient for X. Using lm(), the intercept is 1.46, which is the mean for group A. Using Stan, the intercept is -1.34. To get the mean of group A we have to add -1.34 and 2.8 (equal to 1.46), to get the mean of group B we have to add -1.34 and 2x2.8 (equal to 4.26). This seems so strange to me. Is this always the case?
I know there are more straightforward ways to compare two group means, but this is not my focus. I just want to understand how to interpret the intercept here.
Thank you!
Stan code for ‘linreg’ model:
data{
int<lower=0> N;
vector[N] y;
vector[N] x;
}
parameters{
real alpha;
real beta;
real<lower=0> sigma;
}
model{
y ~ normal(alpha + beta * x, sigma);
}
R code:
data(iris)
index <- which(iris$Species %in% c('versicolor', 'setosa'))
x <- as.integer(iris$Species[index])
y <- iris$Petal.Length[index]
data <- list(N=length(y), x=x, y=y)
fit <- stan(file='linreg.stan', data=data)
Results:
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
alpha -1.34 0.00 0.12 -1.56 -1.42 -1.34 -1.26 -1.11 1575 1
beta 2.80 0.00 0.07 2.66 2.75 2.80 2.85 2.94 1580 1
sigma 0.36 0.00 0.03 0.31 0.34 0.36 0.37 0.41 1901 1
lp__ 52.10 0.03 1.25 48.99 51.51 52.43 53.02 53.52 1382 1