I have two sets of paired samples, each containing 3,000 data points. These sets represent the output of distinct models:
-
The first model, which I’ll refer to as the “reference model,” is derived from field measurements with known predictors. This model has a total error of 10-20% due to its theoretical assumptions.
-
The second model is based on estimated measurements, but the predictors are unknown. This model has a fixed error of 10%.
Both sets of samples are not normally distributed.
Given these conditions, is it feasible to evaluate the performance of the second model using a Bayesian regression approach, considering both errors? I would greatly appreciate any references or examples on how to approach this problem using Bayesian methods.
Thank you!
2 Likes
Do you also have some sort of ground truth to compare against? If not, than I don’t think you can do much more then regress prediction1 on prediction2. If the slope is 1 and intercept is 0, than the predictions on average agree.
The data not being normally distributed does not invalidate a linear regression, you need the residuals to be normally distributed. If they are not, you can use other response families- the choice cam be driven by domain knowledge or by posterior predictive checks.
Some tweaks you can do (though I wouldn’t expect them to have huge impact):
- Use a measurement error model (i.e. treat the independent variable as noisy).
- Use your knowledge of the error magnitude to constrain the standard deviation (or other measure of variability) for both the outcome as well as the measurement error part.
2 Likes
Thank you for your response! Yes, Model 1 serves as my ground truth, and the residuals are not normally distributed. Could you please provide an example in R to address this issue?