Warnings for an errors-in-variables model and the risk of underestimating the slope

Greetings, I am trying to fit an errors-in-variables linear regression model in stan. I got warnings

Warning messages:
1: There were 3 chains where the estimated Bayesian Fraction of Missing Information was low. See
https://mc-stan.org/misc/warnings.html#bfmi-low 
2: Examine the pairs() plot to diagnose sampling problems
 
3: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess 
4: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#tail-ess 

The model is below:

data{
  int nobs;
  real xobs[nobs];
  real y[nobs];
}

parameters {
  real<upper=0> alpha;
  real<lower=0> beta;
  real<lower=0> sigmay;
  real<lower=0> sigmax;
  real x[nobs];
}

model {
  // priors
  alpha ~ normal(0, 10);
  beta ~ normal(0, 10);
  sigmax ~ cauchy(0, 1);
  sigmay ~ cauchy(0, 1);
 
  // model structure  
  for (i in 1:nobs){
    xobs[i] ~ normal(x[i], sigmax);
    y[i] ~ normal(alpha + beta*x[i], sigmay);
  }
}

I have run this in R and there I generate the data via

# Simulated true data
alpha = -10
beta = 2
nobs = 40
x <-  runif(nobs, -3, 3)

# observed data with measurement errors
xsd <- .5  # measurement error
xerr <- rnorm(nobs, 0, xsd)
xobs <- x + xerr
ysd <- .5   # observation error on y
y <- alpha + beta*x + rnorm(nobs, 0, ysd)  

I set chain=3, iter=8000, warmup=4000, thin=1. And here is the pairs plot:


I’m quite new to RStan, and I’m grateful for your assistance.

1 Like

Your fundamental problem is that you have no prior on your latent parameter x.

Consider the following two processes:

  1. x \sim \textrm{Normal}(x_{obs}, \sigma).
  2. x_{obs} \sim \textrm{Normal}(x, \sigma)

Edit: (and this is important), here I use \sim to mean “is sampled from”, as in traditional math notation, and not “increment a target density by the lpdf” as in Stan syntax.

First, convince yourself that these are not the same (even though in either case the distribution of the residuals should be indistinguishable). The crucial difference is that in (1) x has higher variance than x_{obs}, whereas in (2) the opposite is true. The model you wrote down is an attempt at (2), but it is non-generative because you don’t have any prior on the parameter x. Impose a prior, and you will start to see the slope estimate tighten. The standard thing to do is to put a Normal prior on x. However, this may still give you sampling problems. The closest model that brms estimates involves additionally assuming that sigmax is known a priori (it gets passed in as data).

(1) is also a valid model and a valid data generating process (in effect using x_{obs} to place an informative prior on the locations of the true values x). Again if sigmax is known, this model can be estimated without difficulty. However, contrary to the “error-in-x induces biased slope estimates” dogma, in this model it doesn’t. You could just ignore the error in x, and still recover the same coefficient estimate–you’d just a get residual term that conflates the contributions of sigmax and sigmay. Absent prior knowledge about the magnitude of the measurement noise or the process noise (i.e. about sigmax or sigmay), no amount of data can de-conflate them. The remainder of this post might be a bit tangential, but is a walk-through of why this is the case, and does come back around to the model at hand at the end.

We have (suppressing the indexing on i)
x = x_{obs} + \epsilon_1
y = \alpha + \beta x + \epsilon_2

Substituting, we have
y = \alpha + \beta (x_{obs} + \epsilon_1) + \epsilon_2

Letting \epsilon_3 = \epsilon_2 + \beta\epsilon_1 we have
y = \alpha + \beta x_{obs} + \epsilon_3, and note that \epsilon_3 will again be Gaussian.

The point of this rearrangement is to show that the data generating process corresponding to this model can be written in the form of an ordinary linear regression with no error-in-variables. As such, the error-in-variables version is likely to be degenerate (absent very strong priors), since there’s no way for the model to know whether to attribute residual variance to the \epsilon_2 term (controlled by sigmay) or the \beta\epsilon_1 term (controlled by sigmax, assuming that \beta is identified). This degeneracy is what you’re seeing here… because the contribution to the log-likelihood of xobs[i] ~ normal(x[i], sigmax); is identical to x[i] ~ normal(xobs[i], sigmax);! So what you’ve really done by attempting to write down model (2) but omitting the prior on x is to inadvertently write down a version of (1), which is degenerate when you treat both sigmas as parameters to be estiamted from data. And that’s why you have such extreme shapes visible on the sigmax, sigmay margin of your pairs plot.

2 Likes

Thank you! After reading your post, I believe that the standard linear regression model without considering measurement errors is actually robust for my purposes, as I am primarily interested in alpha and beta. The mathematical explanations you provided have been extremely valuable.

1 Like

Great! Just to make sure this is totally clear: if the true data generating process begins with some unknown true set of x’s, and then draws the xobs’s from a normal centered on the x’s (such that xobs has higher variance than x), then ignoring the measurement error will bias estimates of \beta downwards, and correcting for this requires placing an appropriate prior on x. On the other hand, if the true data generating process begins with some set of known xobs’s, and then draws the true x’s from a normal centered on the xobs’s (such that x has higher variance than xobs) then the estimate of \beta is not sensitive to whether or not you explicitly account for this error component separately from the residual.

So for example, if you have some quantity that evolves via a Gaussian random walk, and you observed the value on 15 March (xobs), then if you want to treat the value from 16 March as a covariate in the regression you don’t need to account for the error to estimate the correct slope, but if you want to treat the value from 14 March as a covariate in the regression you do need to account for the error to estimate the correct slope.

Thank you for the clarification! I believe my case aligns with the first one. The true x ’s are from the underlying population dynamics, and xobs ’s are measured from the true x ’s. So I added a prior on the true x ’s, and the model is:

data{
  int nobs;
  real xobs[nobs];
  real y[nobs];
  //real sigmax;
}

parameters {
  real<upper=0> alpha;
  real<lower=0> beta;
  real<lower=5,upper=8> mu_x;
  real<lower=0> sigma_x;
  real<lower=0> sigmax;
  real<lower=0> sigmay;
  real x[nobs];
}

model {
  // priors
  alpha ~ normal(0, 10);
  beta ~ normal(0, 10);
  sigma_x ~ cauchy(0, 10);
  sigmax ~ normal(sqrt(log(0.5^2+1)), 0.01);
  sigmay ~ cauchy(0, 10);
 
  // model structure  
  for (i in 1:nobs){
    x[i] ~ normal(mu_x, sigma_x);
    xobs[i] ~ normal(x[i], sigmax);
    y[i] ~ normal(alpha + beta*x[i], sigmay);
  }
}

Data are generated through

alpha = -10
beta = 2
nobs = 40
x <- runif(nobs, 5, 8)

# observed data with measurement errors
sigmax <- sqrt(log(0.5^2+1))  # measurement error
xobs <- x + rnorm(nobs, 0, sigmax)
sigmay <- sqrt(log(0.5^2+1))   # observation error on y
y <- alpha + beta*x + rnorm(nobs, 0, sigmay) 

While the model provided an accurate estimate for β, I received warnings:

Warning messages:
1: There were 31 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them. 
2: There were 1 chains where the estimated Bayesian Fraction of Missing Information was low. See
https://mc-stan.org/misc/warnings.html#bfmi-low 
3: Examine the pairs() plot to diagnose sampling problems

Here is the pair plot:

I received the same warnings even when I treated sigmax as data. I set adapt_delta=0.99999, stepsize=0.001, max_treedepth=25. I’m not sure how to resolve these issues.