Errors-in-variables model consistently underestimates sigma

The problem is the two bugs in your R code:

xobs <- runif(n, -3, 3) + rnorm(n ,sigma)
yobs <- slope * xobs + intercept + rnorm(n ,sigma) 

The first problem is that the rnorm here uses sigma as the location, not the scale.

The second problem is that yobs should be based on x, not xobs.

Going forward, you probably don’t want the same error scale for the measurement error (x_obs) and observations (y_obs).

Here’s an R program to simulate:

N <- 1000
sigma <- 0.5
alpha <- 1.3
beta <- -0.4
x <- runif(N, -3, 3)
y_obs <- rnorm(N, alpha + beta * x, sigma)
x_obs <- rnorm(N, x, sigma)

library('cmdstanr')

model <- cmdstan_model('foo.stan')
fit <- model$sample(data = list(N = N, x_obs = x_obs, y_obs = y_obs), parallel_chains=4)
print(fit$summary(variables = c("alpha", "beta", "sigma")))

and here’s the matching Stan model:

data{
  int<lower=0> N;
  array[N] real x_obs;
  array[N] real y_obs;
}
parameters {
  real alpha;
  real beta;
  real<lower=0> sigma;
  vector[N] x;
}
model {
  alpha ~ normal(0, 5);
  beta ~ normal(0, 5);
  sigma ~ normal(0, 5);
  x_obs ~ normal(x, sigma);
  y_obs ~ normal(alpha + beta * x, sigma);
}

and here’s the fit that the simulation script prints:

All 4 chains finished successfully.
Mean chain execution time: 3.1 seconds.
Total execution time: 3.2 seconds.

# A tibble: 3 × 10
  variable   mean median      sd     mad     q5    q95  rhat ess_bulk ess_tail
  <chr>     <dbl>  <dbl>   <dbl>   <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
1 alpha     1.32   1.32  0.0173  0.0168   1.29   1.34   1.00    4515.    3011.
2 beta     -0.370 -0.370 0.00912 0.00911 -0.384 -0.355  1.00    4293.    2735.
3 sigma     0.498  0.498 0.0114  0.0114   0.480  0.518  1.00    1812.    2185.
2 Likes