It depends what you mean by “error”, and what kind of information you’re trying to model. Here’s something else you may want to consider.
If you think that there is some quantity y (say a number of photons) that you’ve observed, but you’re aware that your observations z are imperfect, then you’d want to specify a probability distribution for that unknown quantity conditional on your observations and on the information you have about measurement quality. Or, to put it slightly differently, you need to specify a probability distribution for the errors of observation (\delta_i = z_i - y_i).
I think there is confusion with the use of the term “error” for different concepts; “error distribution” has its origins in measurement errors (errors of observation), but quickly came to refer to “errors” about the regression line as well (i.e., variation in y that is caused by factors not included in your covariates X even when all variables are precisely observed). (At least the history is something like that, and more). I think it helps to disentangle these two concepts.
Many applications require something like this, which incorporates both types of “error”:
data {
int n;
vector[n] z; // observations or estimates of y
vector[n] se; // standard errors of the observations
vector[n] x;
}
parameters {
vector[n] y;
real alpha;
real beta;
real<lower=0> sigma;
}
model {
// data model
z ~ normal(y, se); // this says, my observations are imperfect
// process model
y ~ normal(alpha + x * beta, sigma); // this says, something in addition to $x$ determines $y$
// parameter models
// put priors on your parameters, including y
}
An important part is your background information about y. Just have to think about what kinds of observations would be reasonable, and then penalize values that are unreasonable. Might look like y \sim normal(\mu_y, \sigma_y), which will place low probability on values a couple standard deviations away from the inferred mean \mu_y (resulting in ‘shrinkage’ towards \mu_y), which may or may not be reasonable for your application area. Could use y \sim student_t(\nu_y, \mu_y, \sigma_y), or whatever… Observations with small standard error won’t budge much, but if you have relatively high uncertainty about the observation and its something like an outlier, then your prior will matter quite a bit.
The terminology (data model, process model, parameter model) comes from Berliner 1996 I think and is used throughout Cressie and Wikle 2011. Others have different terminology. This way you can make inferences about your process model conditional on your uncertainty of observation and prior information about y.
Berliner, L.M. Hierarchical Bayesian time-series models. In Maximum Entropy and Bayesian Methods; Hanson, K.M.; Silver, R.N., Eds.; Springer Netherlands, 1996.
Cressie, N.; Wikle, C.K. Statistics for Spatio-Temporal Data; Wiley, 2011.
Richardson, S.; Gilks, W.R. A Bayesian approach to measurement error problems in epidemiology using conditional independent models. American Journal of Epidemiology 1993, 138, 430–442.