Model Validation - Linear Regression with X and Y Uncertainty

Hi Stan Users,

I’m trying to write a fairly simple Bayesian linear regression for a dataset where both the independent and dependent variables are analytically measured data with known gaussian uncertainties. This is a bit different than the example in the manual where only the dependent variable has an uncertainty.

The below model seems to be returning decent results, but I was wondering if any of y’all could help confirm that my approach is a statistically valid one.


data {
  int N;             // Number of observations
  vector[N] x0;      // Observations
  vector[N] x_sd;    // S.D. of observations (assuming normally distributed uncertainty)
  vector[N] y0;      // Outcomes
  vector[N] y_sd;    // S.D. of Outcomes
  // Inputs for priors on regression coefficients (just as an example)
  real slope_min;    
  real slope_max;
  real intercept_min;
  real intercept_max;
parameters {
  vector[2] beta;  // regression coefficients
  vector[N] X;     // "uncertain" observations
model {
  // Define local variables
  matrix[N,2] XP;
  // Populate expectations matrix
  XP[:,1] = rep_vector(1.0,N); // column for y-int
  XP[:,2] = X;                          // column for slope
  // Define Priors
  X ~ normal(x0,x_sd);
  beta[1] ~ uniform(intercept_min,intercept_max);
  beta[2] ~ uniform(slope_min,slope_max);
  // Define Likelihood
  y0 ~ normal(XP*beta,y_sd);
1 Like

Do you want to model the relationship on the latent scale that is between y_latent and x_latent (just called X in your Stan code)? If so, your model is not what you want, since y is still modeled on the manifest scale, but with a fixed residual standard deviation.

1 Like

Hi Paul,

Thanks for your input! I am indeed trying to model the relationship on the latent scale. After doing some more digging yesterday I ran across your brms package (which is awesome by the way). With some tinkering and looking at the stancode() output from that package I was able to figure out how to parameterize things. Though after seeing how nicely brms performs I think I’m going to just roll with that for the time being.

For those who may stumble across this post at some point it’s super easy to incorporate measurement errors as mentioned in my first post using that R package:

fit <- brm(y | se(sdy, sigma=TRUE) ~ me(x, sdx), data = some_data, …)


1 Like

A slight correction to your model. With y | se(sdy, sigma=TRUE) you predict y on the manifest scale (despite including the measurement error). Essentially you fit a meta-analytic model.

If you want to model the relationship between both the latent variables, go for y | mi(sdy)

1 Like