Bayesian Regression

I have two sets of paired samples, each containing 3,000 data points. These sets represent the output of distinct models:

  1. The first model, which I’ll refer to as the “reference model,” is derived from field measurements with known predictors. This model has a total error of 10-20% due to its theoretical assumptions.

  2. The second model is based on estimated measurements, but the predictors are unknown. This model has a fixed error of 10%.

Both sets of samples are not normally distributed.

Given these conditions, is it feasible to evaluate the performance of the second model using a Bayesian regression approach, considering both errors? I would greatly appreciate any references or examples on how to approach this problem using Bayesian methods.

Thank you!

2 Likes

Do you also have some sort of ground truth to compare against? If not, than I don’t think you can do much more then regress prediction1 on prediction2. If the slope is 1 and intercept is 0, than the predictions on average agree.

The data not being normally distributed does not invalidate a linear regression, you need the residuals to be normally distributed. If they are not, you can use other response families- the choice cam be driven by domain knowledge or by posterior predictive checks.

Some tweaks you can do (though I wouldn’t expect them to have huge impact):

  1. Use a measurement error model (i.e. treat the independent variable as noisy).
  2. Use your knowledge of the error magnitude to constrain the standard deviation (or other measure of variability) for both the outcome as well as the measurement error part.
2 Likes

Thank you for your response! Yes, Model 1 serves as my ground truth, and the residuals are not normally distributed. Could you please provide an example in R to address this issue?