Weird results w/omitted variable?

Can somebody explain this to me.
I generate data where an outcome Y is a function of 2 independent variables X and Z.

bX <- 1
bZ <- 1
x <- rnorm(N, 1, 2)
z <- rnorm(N, 10, 2)
y <- rnorm(N, bX*x + bZ*z, 1)

If I regress Y ~ X, I should retrieve bX=1.
This is true w/ lm()

a <- lm(y ~ x)
summary(a)

Call:
lm(formula = y ~ x)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.2715 -1.4927 -0.0054  1.4726  7.0977 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 10.04753    0.07744  129.75   <2e-16 ***
x            0.97534    0.03587   27.19   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.222 on 998 degrees of freedom
Multiple R-squared:  0.4256,	Adjusted R-squared:  0.425 
F-statistic: 739.5 on 1 and 998 DF,  p-value: < 2.2e-16

but not in stan

data{
    int<lower=1> N;
    real y[N];
    real x[N];
}
parameters{
    real bX;
    real<lower=0> sigma;
}
model{
    vector[N] mu;
    sigma ~ exponential( 1 );
    bX ~ normal( 0 , 1 );
    for ( i in 1:N ) {
        mu[i] = bX * x[i];
    }
    y ~ normal( mu , sigma );
}
generated quantities{
    vector[N] mu;
    for ( i in 1:N ) {
        mu[i] = bX * x[i];
    }
}

where my estimate for bX is

      mean   sd 5.5% 94.5%
bX    2.88 0.14 2.66  3.10

Your Stan model is missing an intercept.

1 Like

Ha, thank god

1 Like