So this problem has been bugging me on a run. I don’t think I have my thoughts completely sorted out, but I need to move on to actual work, so I’ll share them to have peace of mind and hopefully help you think more clearly.
Let us imagine we have performed a calibration with chosen true values X = x_1, ... x_n which lead to observations Y = y_1, .... y_n. For simplicity let us assume that the true relationship is a simple linear function with Gaussian noise. Further, we’ve observed a new data point \hat{y} and we want to make inferences about the true value \hat{x}, which is unobserved. A full Bayesian treatment of this problem leads to a joint model:
y_i \sim N(\mu_i, \sigma) \\
\mu_i = \alpha + \beta x_i \\
\hat{y} \sim N(\hat\mu, \sigma)\\
\hat{\mu} = \alpha + \beta \hat{x}\\
\alpha \sim \pi_\alpha,
\beta \sim \pi_\beta,
\sigma \sim \pi_\sigma,
\hat{x} \sim \pi_{\hat{x}}
Where \alpha, \beta, \sigma, \hat{x} are parameters and \pi_\alpha, \pi_\beta, \pi_\sigma, \pi_{\hat{x}} are their associated priors. So if I am not wrong about this, if you want to propagate all the uncertainty correctly, anything you do needs to be either proven to be equivalent to inference over this full model or needs to be thought of as an approximation to this full model.
I actually think that this full model can be expressed in brms
by treating the \hat{x} as a missing predictor, i.e. something like (not tested) bf(y ~ mi(x)) + bf(x | mi() ~ 1)
. However forcing you to rerun the Stan model for each data point you want to infer seems bad.
Now this is where my math/stats fail me. I find it quite likely (but can’t prove) that the full posterior distribution p(\alpha, \beta, \sigma, \hat{x} | X, Y, \hat{y}) can be broken down, so that:
p(\alpha, \beta, \sigma, \hat{x}| X, Y, \hat{y}) = p(\alpha, \beta, \sigma | X, Y) p(\hat{x} | \alpha, \beta, \sigma, \hat{y})
And, by Bayes theorem
p(\alpha, \beta, \sigma | X, Y) p(\hat{x} | \alpha, \beta, \sigma, \hat{y}) \propto p(\alpha, \beta, \sigma | X, Y) p(\hat{y} | \alpha, \beta, \sigma, \hat{x}) \pi_{\hat{x}}(\hat{x})
i.e. that if you do the inference for \alpha, \beta, \sigma separately, giving you (samples from) a joint posterior distribution p(\alpha, \beta, \sigma) then you can make inferences for \hat{x} separately. But maybe even that is just an approximation… At least if the prior on \hat{x} is not improper uniform, one can imagine that if “inverting” the calibration data gives you values inconsistent with the prior, you would need to update you belief about the calibration data… But maybe this is a good thing, maybe we don’t want to update our beliefs about the calibration just because it would lead to absurd inferences for new values? Sorry, can’t wrap my head around this completely right now. But I think that if the calibration experiment is informative and our priors on \hat{x} are not too tight, \hat{y} should have negligible effect on everything except \hat{x} and the separation is likely justified for practical use.
In any case, you IMHO at least need to handle the uncertainty introduced between \hat\mu and \hat{y}. I think (once again, not so sure about the math, double check me) that if \pi_{\hat{x}} = N(m, \tau) then, treating \alpha, \beta, \sigma as fixed you can say that a-priori \hat\mu \sim N(\alpha + \beta m, \tau \beta). Following wiki we get an analytical form of the posterior for \hat\mu is (with slight abuse of notation)
p(\hat\mu | \alpha,\beta,\sigma, \hat{y}) = N\left(\frac{\hat{y}/\sigma^2 + (\alpha + \beta m)/(\tau^2 \beta^2)}{1/\sigma^2 + 1/(\tau^2 \beta^2)}, \frac{1}{1/\sigma^2 + 1/(\tau^2 \beta^2)}\right)
If you instead assume improper flat prior for \hat{x} (and hence \hat\mu) that means \tau \rightarrow \inf and 1/\tau^2 \rightarrow 0
So the full posterior would be:
p(\hat\mu | \alpha,\beta,\sigma, \hat{y}) = N\left(\hat{y}, \sigma^2 \right)
This gives us the following process to infer \hat{x} given \hat{y} and posterior samples for \alpha, \beta, \sigma
- For each posterior sample from the calibration model:
- Compute the conditional posterior of \hat{\mu} given \hat{y}
- Draw a single sample of \hat{\mu}
- Transform the sample to of \hat{\mu} to a sample of \hat{x} by “inverting the model”
Obviously, once we move beyond gaussian distributions and linear relationships, things become way more complicated in practice (because we most likely loose the ability to have analytical form for p(\hat\mu | \hat{y}))
Does that make at least some sense?
EDIT: I obviously can’t let this go :-). It ocurred to me that thinking about the problem as a joint model also clarifies what it means when you have repeated measurement which you assume came from the same unobserved \hat{x} (all those measurements should jointly inform the posterior for \hat{\mu}). Similarly, if you have multiple measurements of different \hat{x}_1, .. \hat{x}_k - the inferences cannot be treated as independent - e.g. if there would be substantial uncertainty in \alpha in the above model, you would expect either all \hat{x}_1, .. \hat{x}_k to be “somewhat low” (if \alpha is actually low) or all “somewhat high” (if \alpha is high), creating a correlation in the posterior for \hat{x}_1, .. \hat{x}_k.