Warnings for an errors-in-variables model and the risk of underestimating the slope

Your fundamental problem is that you have no prior on your latent parameter x.

Consider the following two processes:

  1. x \sim \textrm{Normal}(x_{obs}, \sigma).
  2. x_{obs} \sim \textrm{Normal}(x, \sigma)

Edit: (and this is important), here I use \sim to mean “is sampled from”, as in traditional math notation, and not “increment a target density by the lpdf” as in Stan syntax.

First, convince yourself that these are not the same (even though in either case the distribution of the residuals should be indistinguishable). The crucial difference is that in (1) x has higher variance than x_{obs}, whereas in (2) the opposite is true. The model you wrote down is an attempt at (2), but it is non-generative because you don’t have any prior on the parameter x. Impose a prior, and you will start to see the slope estimate tighten. The standard thing to do is to put a Normal prior on x. However, this may still give you sampling problems. The closest model that brms estimates involves additionally assuming that sigmax is known a priori (it gets passed in as data).

(1) is also a valid model and a valid data generating process (in effect using x_{obs} to place an informative prior on the locations of the true values x). Again if sigmax is known, this model can be estimated without difficulty. However, contrary to the “error-in-x induces biased slope estimates” dogma, in this model it doesn’t. You could just ignore the error in x, and still recover the same coefficient estimate–you’d just a get residual term that conflates the contributions of sigmax and sigmay. Absent prior knowledge about the magnitude of the measurement noise or the process noise (i.e. about sigmax or sigmay), no amount of data can de-conflate them. The remainder of this post might be a bit tangential, but is a walk-through of why this is the case, and does come back around to the model at hand at the end.

We have (suppressing the indexing on i)
x = x_{obs} + \epsilon_1
y = \alpha + \beta x + \epsilon_2

Substituting, we have
y = \alpha + \beta (x_{obs} + \epsilon_1) + \epsilon_2

Letting \epsilon_3 = \epsilon_2 + \beta\epsilon_1 we have
y = \alpha + \beta x_{obs} + \epsilon_3, and note that \epsilon_3 will again be Gaussian.

The point of this rearrangement is to show that the data generating process corresponding to this model can be written in the form of an ordinary linear regression with no error-in-variables. As such, the error-in-variables version is likely to be degenerate (absent very strong priors), since there’s no way for the model to know whether to attribute residual variance to the \epsilon_2 term (controlled by sigmay) or the \beta\epsilon_1 term (controlled by sigmax, assuming that \beta is identified). This degeneracy is what you’re seeing here… because the contribution to the log-likelihood of xobs[i] ~ normal(x[i], sigmax); is identical to x[i] ~ normal(xobs[i], sigmax);! So what you’ve really done by attempting to write down model (2) but omitting the prior on x is to inadvertently write down a version of (1), which is degenerate when you treat both sigmas as parameters to be estiamted from data. And that’s why you have such extreme shapes visible on the sigmax, sigmay margin of your pairs plot.

2 Likes