Can somebody explain this to me.
I generate data where an outcome Y is a function of 2 independent variables X and Z.
bX <- 1
bZ <- 1
x <- rnorm(N, 1, 2)
z <- rnorm(N, 10, 2)
y <- rnorm(N, bX*x + bZ*z, 1)
If I regress Y ~ X, I should retrieve bX=1.
This is true w/ lm()
a <- lm(y ~ x)
summary(a)
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-7.2715 -1.4927 -0.0054 1.4726 7.0977
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.04753 0.07744 129.75 <2e-16 ***
x 0.97534 0.03587 27.19 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.222 on 998 degrees of freedom
Multiple R-squared: 0.4256, Adjusted R-squared: 0.425
F-statistic: 739.5 on 1 and 998 DF, p-value: < 2.2e-16
but not in stan
data{
int<lower=1> N;
real y[N];
real x[N];
}
parameters{
real bX;
real<lower=0> sigma;
}
model{
vector[N] mu;
sigma ~ exponential( 1 );
bX ~ normal( 0 , 1 );
for ( i in 1:N ) {
mu[i] = bX * x[i];
}
y ~ normal( mu , sigma );
}
generated quantities{
vector[N] mu;
for ( i in 1:N ) {
mu[i] = bX * x[i];
}
}
where my estimate for bX is
mean sd 5.5% 94.5%
bX 2.88 0.14 2.66 3.10