Bayesian Measurement Error Model in Stan

Hi everyone,

I’m working on a Bayesian Measurement Error Model in Stan and would appreciate your feedback on whether my code is correctly specified. The objective is to account for measurement errors in both x (Method 1) and y (Method 2) while modeling their true relationship using a Bayesian approach.

  • Method 1 estimates evapotranspiration,
  • Method 2 measures evapotranspiration and is treated as the reference.

Both methods are assumed to have a 10% error. I’d greatly appreciate any insights or suggestions on improving the model specification.
Thank you in advance

Stan code:

data {
int<lower=0> N;                      // Number of data points (days)
vector[N] x_obs;                     // Observed values for x (Method 1)
vector[N] y_obs;                     // Observed values for y (Method 2)
real<lower=0> sigma_x;         // Known measurement error in Method 1 (x) (10%)
real<lower=0> sigma_y;         // Known measurement error in Method 2 (y) (10%)
}

parameters {
real alpha;                              // Intercept
real beta;                                // Slope
real<lower=0> sigma;             // Residual standard deviation
}

model {
// Priors
alpha ~ normal (0, 10);            // Weakly informative prior for intercept
beta ~ normal (0, 10);              // Weakly informative prior for slope
sigma ~ cauchy (0, 5);             // Weakly informative prior for residuals

// Structural model with error propagation for both x and y
y_obs ~ normal(alpha + beta * x_obs, sqrt(sigma^2 + sigma_y^2 + beta^2 * sigma_x^2));
}

Hi, @Alessandro, and sorry for missing this earlier.

The usual thing to do here is to treat the true x as an unkown. The measurement error in y is rolled into sigma_y. Then you’d have something like this, where you fit sigma_x, sigma_y and x,

parameters {
  vector[N] x;
  real<lower=0> sigma_x;
  real<lower=0> sigma_y;

model {
  sigma_x ~ normal(0.1, 1);
  sigma_y ~ normal(0.1, 1);
  x ~ normal(x_obs, sigma_x);
  ...
  y_obs ~ normal(alpha + beta * x, sigma_y);
}

The sigma_y variable soaks up modeling error and observation error in y_obs.

You’d want to adjust location and scale of sigma_x and sigma_y.

Hi @Bob_Carpenter,
Thank you very much for your response. My overall goal is to evaluate the quality and readability of Method 1 as a method for estimating daily evapotranspiration, using a model that explicitly accounts for uncertainty in both the predictor and the outcome. I’d also like to clarify that my dataset includes daily measurements of evapotranspiration, and the values are different each day. The two methods used are:
-Method 1, which provides an estimation
-Method 2, which serves as the reference method

I’ve adapted the Stan code according to your suggestions:

stan_code <- "
data {
  int<lower=1> N;
  vector[N] x_obs;              
  vector[N] y_obs;              
}
parameters {
  vector[N] x;                  
  real alpha;
  real beta;
  real<lower=0> sigma_x;        
  real<lower=0> sigma_y;        
}
model {
  // Priors
  sigma_x ~ normal (0.24, 0.11);     // 10% error (sigma_x <- mean(Method 1) * 0.10)  and  sd
  sigma_y ~ normal (0.17, 0.09);     // 10% error (sigma_y<- mean(Method 2) * 0.10) and sd
  alpha ~ normal (1.49, 0.5);
  beta ~ normal (1.40, 0.5);

  x_obs ~ normal (x, sigma_x);  
  y_obs ~ normal (alpha + beta * x, sigma_y);  
}
generated quantities {
  vector[N] y_pred;
  vector[N] residual_error;

  for (i in 1:N) {
    y_pred[i] = normal_rng(alpha + beta * x[i], sigma_y);
    residual_error[i] = fabs(y_obs[i] - y_pred[i]);   //. this should be the error of Method 1
  }
}

I have additional questions to clarify some aspects of the model. I would really appreciate it if you could take a look and help me to clarify the points below:

  1. Is the way I adjusted the location and scale of sigma_x and sigma_y appropriate?

  2. If my goal is to account for the error in Method 1 (the estimation of ET), should I model Method 1 as x (predictor) and Method 2 (reference method) as y (the outcome), or the other way around?

  3. To achieve my goal, is it appropriate to use the model above and explicitly include measurement error in both Method 1 and Method 2? Or is it also valid to include measurement error only for Method 2, assuming that Method 1 represent the modeled outcome?