Model Validation - Linear Regression with X and Y Uncertainty

dhowell · May 9, 2018, 11:12am

Hi Stan Users,

I’m trying to write a fairly simple Bayesian linear regression for a dataset where both the independent and dependent variables are analytically measured data with known gaussian uncertainties. This is a bit different than the example in the manual where only the dependent variable has an uncertainty.

The below model seems to be returning decent results, but I was wondering if any of y’all could help confirm that my approach is a statistically valid one.

Thanks!

data {
  int N;             // Number of observations
  vector[N] x0;      // Observations
  vector[N] x_sd;    // S.D. of observations (assuming normally distributed uncertainty)
  vector[N] y0;      // Outcomes
  vector[N] y_sd;    // S.D. of Outcomes
  
  // Inputs for priors on regression coefficients (just as an example)
  real slope_min;    
  real slope_max;
  real intercept_min;
  real intercept_max;
}
parameters {
  vector[2] beta;  // regression coefficients
  vector[N] X;     // "uncertain" observations
}
model {
  // Define local variables
  matrix[N,2] XP;
  
  // Populate expectations matrix
  XP[:,1] = rep_vector(1.0,N); // column for y-int
  XP[:,2] = X;                          // column for slope
  
  // Define Priors
  X ~ normal(x0,x_sd);
  beta[1] ~ uniform(intercept_min,intercept_max);
  beta[2] ~ uniform(slope_min,slope_max);
  
  // Define Likelihood
  y0 ~ normal(XP*beta,y_sd);
}

paul.buerkner · May 9, 2018, 12:00pm

Do you want to model the relationship on the latent scale that is between y_latent and x_latent (just called X in your Stan code)? If so, your model is not what you want, since y is still modeled on the manifest scale, but with a fixed residual standard deviation.

dhowell · May 10, 2018, 10:42am

Hi Paul,

Thanks for your input! I am indeed trying to model the relationship on the latent scale. After doing some more digging yesterday I ran across your brms package (which is awesome by the way). With some tinkering and looking at the stancode() output from that package I was able to figure out how to parameterize things. Though after seeing how nicely brms performs I think I’m going to just roll with that for the time being.

For those who may stumble across this post at some point it’s super easy to incorporate measurement errors as mentioned in my first post using that R package:

e.g.
fit <- brm(y | se(sdy, sigma=TRUE) ~ me(x, sdx), data = some_data, …)

Cheers,
Dorran

paul.buerkner · May 10, 2018, 11:37am

A slight correction to your model. With y | se(sdy, sigma=TRUE) you predict y on the manifest scale (despite including the measurement error). Essentially you fit a meta-analytic model.

If you want to model the relationship between both the latent variables, go for y | mi(sdy)

olga.botezatu · March 5, 2024, 7:35am

You have correctly defined your dataset, including the number of observations (N), the observed values of both independent (x0) and dependent (y0) variables, and their standard deviations (x_sd and y_sd). You’ve constructed a matrix where the first column corresponds to the intercept term and the second column corresponds to the slope term. This matrix is used to compute the expected values of the dependent variable based on the regression coefficients and independent variable values. You’ve specified priors for the regression coefficients (beta) and the uncertain observations (X). The priors on the regression coefficients are uniform distributions, which is a common choice. The priors on the uncertain observations are normal distributions, which is appropriate given the assumption of normally distributed uncertainty.

Topic		Replies	Views
Estimating slope and intercept (linear regression) for data x vs. y with uncertainties in both x and y Modeling	15	3027	March 13, 2020
How to include data measurement uncertainty in stan/pystan Modeling techniques	7	918	November 13, 2019
Warnings for an errors-in-variables model and the risk of underestimating the slope Modeling rstan , fitting-issues	4	585	September 25, 2023
Non-parametric, multidimensional regression with measurement error in predictors and responses Modeling techniques	1	774	April 1, 2021
Another measurement error modeling question Modeling	13	588	July 5, 2023

Model Validation - Linear Regression with X and Y Uncertainty

Related topics