I have come across an interesting dilemma comparing an “intuitive” likelihood formulation for an error-in-variables model versus the standard Bayesian measurement error model recommended in the Stan docs (one that I’ve used to good success in the past). Consider (assuming \sigma_m is known):

x_{true,i} = x_{obs,i} + \delta_i

\delta_i \sim \text{Normal}(0,\sigma_m)

y_i=a+b*x_{true,i}+\epsilon_i

\epsilon_i\sim\text{Normal}(0,\sigma)

So, substituting:

y_i = a+b(x_{obs,i}+\delta_i)+\epsilon_i

y_i = a+b*x_{obs,i}+b*\delta_i+\epsilon_i

b*\delta_i+\epsilon_i\sim\text{Normal}(0,\sqrt{(b\sigma_m)^2+\sigma^2})

Therefore conclude y_i \sim \text{Normal}(a+b*x_{obs,i}),\sqrt{(b\sigma_m)^2+\sigma^2})

Leading to the following Stan code:

```
data {
int<lower=0> N;
vector[N] y;
vector[N] x_obs;
real sigma_m;
}
parameters {
real a;
real b;
real<lower=0> sigma;
}
model {
a ~ normal(0,5);
b ~ normal(0,5);
sigma ~ cauchy(0,5);
y ~ normal(a+b*x_obs, sqrt(sigma^2+(b*sigma_m)^2));
}
```

Compared with the usual formulation

```
data {
int<lower=0> N;
vector[N] y;
vector[N] x_obs;
real sigma_m;
}
parameters {
real a;
real b;
vector[N] x_true;
real<lower=0> sigma;
real mu_x;
real<lower=0> sd_x;
}
model {
a ~ normal(0,5);
b ~ normal(0,5);
sigma ~ cauchy(0,5);
x_true ~ normal(mu_x,sd_x);
x_obs ~ normal(x_true,sigma_m);
y ~ normal(a+b*x_true, sigma);
}
```

Now, when fitting both models to simulated data the second one recovers the known a,b, and \sigma values. However, for the first one the b estimates are biased consistently low – in fact, they are exactly equal to the estimates if one just fit a linear regression of y on the noisy x (i.e. the classic regression dilution). The \sigma estimates are close to the known value. This was surprising to me, particularly because the first formulation avoids having to treat x_{true} as a model parameter and subsequent slower estimation and need for performance tuning.

My rough understanding is that the first model accounts for the extra vertical “spread” in the data but not the extra horizontal “spread” leading to the diluted slope estimates. Can anyone provide a more rigorous description or perhaps identify an error in the derivation of the first model?