Convergence problems / high Rhat for measurement error model

I’m trying to run a fairly simple linear regression model with measurement error the response variable. But I’m getting issues with model convergence and am wondering if there’s something obviously wrong with my model specification.

data {
  int<lower=0> N;     // number of cases
  vector[N] y_meas;   // measurement of y
  real<lower=0> tau;  // measurement noise
  vector[N] x;        // predictor
}

model {
  # Priors 
  alpha ~ normal(0, 10);
  beta ~ normal(0, 10);
  sigma ~ cauchy(0, 5);

  # Model
  y ~ normal(mu_y, sigma_y);
  y_meas ~ normal(y, tau);
  y ~ normal(alpha + x * beta, sigma); 
} 

Some reproducible data are:

dat <- list(
N=40,
y_meas = c(16.61, 28.97, 24.13, 14.74, 17.66, 13.16, 8.59, 17.47, 12.92, 10.75, 19.26, 14.01, 8.91, 18.02, 11.33, 12.83, 21.26, 16.86, 11.76, 20.91, 15.04, 20.50, 26.47, 13.21, 13.59, 22.42, 10.57, 12.61, 17.95, 10.96, 16.54, 26.33, 21.75, 14.22, 20.60, 16.71, 17.86, 15.24, 14.87),
x = c(35,  60, 100,  35,  60, 100,  35,  60, 100,  35,  60, 100,  35,  60, 100,  35,  60, 100,  35,  60, 100, 35,  60, 100,  35,  60, 100,  35,  60, 100,  35,  60, 100,  35,  60, 100,  35,  35,  35,  35)
)

And the model call I use is:

stan(file = "sample-model.stan", data = dat,
           iter = 4000,
           control=list(adapt_delta=0.95,
                        max_treedepth=13),
           chains = 3) 

Example model output are:

Warning messages:
1: There were 191 divergent transitions after warmup. Increasing adapt_delta above 0.95 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 
2: There were 8 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 13. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded 
3: There were 3 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low 
4: Examine the pairs() plot to diagnose sampling problems
 
5: The largest R-hat is 3.46, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat 
6: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess 
7: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess 

Here are the parameter stats:

print(m2,pars=c("alpha","beta","sigma"))

Inference for Stan model: polynomial_meas-err_uni.
3 chains, each with iter=4000; warmup=2000; thin=1; 
post-warmup draws per chain=2000, total post-warmup draws=6000.

      mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff     Rhat
alpha 3.53    0.92 1.13 2.53 2.53 2.96 5.11  5.11     2 92701.63
beta  0.00    0.00 0.00 0.00 0.00 0.00 0.00  0.00   141     1.02
sigma 0.00    0.00 0.00 0.00 0.00 0.00 0.00  0.00     2   152.15

Samples were drawn using NUTS(diag_e) at Wed Apr 15 14:13:49 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Is there something obvious that I’m doing wrong with model specification? Or is it generally not advised to have a measurement error model for a response variable?

You have y on the right hand side of a tilde twice, which doesn’t seem right. From your data section, I gather that you have an outcome y_meas that’s measured repeatedly and you have some external point-estimate of the measurement noise, tau. In that case, you’d simply do:

data {
  int<lower=0> N;     // number of cases
  vector[N] y_meas;   // measurement of y
  real<lower=0> tau;  // measurement noise
  vector[N] x;        // predictor
}
parameters{
  real alpha ;
  real beta ;
}
model {
  # Priors 
  alpha ~ normal(0, 10);
  beta ~ normal(0, 10);

  # Likelihood
  y_meas ~ normal(alpha + x * beta, tau); 
} 
1 Like

Now, it’s usually the case that we might be a little uncertain on our point estimates of measurement error, so to properly account for that you’d do:

data {
  int<lower=0> N;     // number of cases
  vector[N] y_meas;   // measurement of y
  real<lower=0> tau;  // measurement noise
  vector[N] x;        // predictor
}
parameters{
  real alpha ;
  real beta ;
  real<lower=0> sigma ;
}
model {
  # Priors 
  alpha ~ normal(0, 10);
  beta ~ normal(0, 10);
  sigma ~ normal(tau, 0.001) ; //where you'd want to tweak the SD argument to reflect your certainty on tau

  # Likelihood
  y_meas ~ normal(alpha + x * beta, sigma); 
} 
2 Likes

Thanks, @mike-lawrence.

What I’m trying to do is leverage a known measurement error associated with instrumental error.

y_meas are the actual measures I have. tau is the measurement error associated with the instrument. So what I’d like to model is some true y, where:

y~ normal(y_meas, tau)

Do you have any thoughts on how to build that in?

Oops, seems we were both writing responses at the same time. In any event, the models I provided do give you the thing you want, y_true, you just have to use the generated quantities section to get it:

generated quantities{
    vector[N] y_true = alpha + x * beta;
}

Thanks. In the Stan user’s guide, it specifies error in x like I did it it originally:

data {
int<lower=0> N; // number of cases
vector[N] x; // predictor (covariate)
vector[N] y; // outcome (variate)
}

parameters {
real alpha; // intercept
real beta; // slope
real<lower=0> sigma; // outcome noise
}

model {
y ~ normal(alpha + beta * x, sigma);
alpha ~ normal(0, 10);
beta ~ normal(0, 10);
sigma ~ cauchy(0, 5);
}
data {
...
real x_meas[N]; // measurement of x
real<lower=0> tau; // measurement noise
}
parameters {
real x[N]; // unknown true value
real mu_x; // prior location
real sigma_x; // prior scale
...
}
model {
x ~ normal(mu_x, sigma_x); // prior
x_meas ~ normal(x, tau); // measurement model
y ~ normal(alpha + beta * x, sigma);
...
}

Do I understand you right that such an approach wouldn’t work for measurement error in y?

Modelling measurement error in the predictors is a rather different thing than modelling measurement error in the outcome; the latter is nearly ubiquitous (you just have a slightly unique scenario of having a very strong prior on the magnitude) while the latter is more niche. If all you want to do is use prior information to constrain sampling for measurement error in the outcome, the two models I gave will do the trick (with the latter being a bit more realistic).

@mike-lawrence Thanks for helping me. What hasn’t quite sunk in yet is that the approach you gave uses tau–or the error of the instrument–as the mean of the model error. I think of the model error is the amount of the observed y not explained by x. What I have in my mind is that there’s error associated with measurement of y and error associated with how much x explains y. The way I’m understanding what you wrote is that those two things should be lumped together. What I’d like to do is to be able to model them separately, especially to be able forecast changes in y based on multiple error components. Not sure if that makes sense…

As expressed, these quantities are not identified (i.e. they simply sum to make a single error term, so without strong prior information on each, you can’t do inference on them). It also doesn’t make much sense conceptually as the idea behind modelling a relationship between x and y is that there is some deterministic relationship that is then observed with measurement noise. You might have nuance in having the magnitude of measurement noise influenced by x (or some other predictor), and/or have measurement noise in the observed values for x itself, but trying to talk about noise in the relationship that is separate from measurement noise doesn’t really make sense.

2 Likes

@mike-lawrence thanks again for all your help. I’m having a hard time figuring out how to specify what the proper error should be for y, given that it’s additive of model error–which I don’t know–and measurement error–which I do know.

Let’s say I know that the measurement error of my response variable is 5. I know that the model error would only increase that error, so I wouldn’t want to specify a prior that was norm(5,sd) because that would allow the measurement error to be lower than we know it to be.

Do you have any thoughts on how to specify the prior on the error for y given that I know what one of the error components is but I had no idea about the other?

I think that just adding the variances should work, i.e.:

data {
  real<lower=0> sd_known;
  ...
}

parameters {
 real mu;
 real<lower=0> sd_parameter
}

model {
  real sd_total = sqrt(sd_known ^ 2 + sd_parameter ^ 2); 
  y ~ normal(mu, sd_total);
  sd_parameter ~ normal(0, 5); //whatever prior you find sensible
}

does that make sense?

1 Like

Thanks @martinmodrak. Should sd_total = sqrt(sd_known ^ 2 + sd_parameter ^ 2); go in the model { } block–which is how I was interpreting what you suggested–or a transformed parameters { } block? Or something else? Thanks!

Sorry, I didn’t put that in. The difference between transformed parameters and model block is mostly just that variables in transformed parameters are stored in the model fit while variable in model are forgotten after each step of the algorithm, so for convergence purposes, it shouldn’t really matter :-)