I am fitting a nonlinear biochemical/kinetic model with several correlated parameters, which is a task in itself, but on top of that I am trying to incorporate an additional feature, which is to account for machine error as well as experimental error, which are qualitatively different. I discuss it in more detail here, but in essence the machine always injects some additional signal \epsilon to any readout, which leads to a sort of ‘limit-of-detection’ problem since samples orders of magnitude below \epsilon are hidden in the noise. That is, any measured signal Y_{n}^{measured} is made up of two components:
Y_{n}^{measured} = Z_{n} + \epsilon_{n}
where
Z_{n} \sim \mathcal{N}(Y_{n}^{true},\sigma) is a separate measurement error model for the underlying data, and
\epsilon_{n} \sim \mathcal{N}(L,\tau) is the added machine noise.
(Just in case you’re wondering, machine noise is added in on purpose by companies that manufacture the detection devices because they don’t want to report negative values)
This implies in the following
Y_{n}^{measured} \sim \mathcal{N}( Y_{n}^{true} + L, \sigma + \tau)
I initially thought that the addition of the variances means that \sigma + \tau cannot be identified separately, as was @Bob_Carpenter 's suggestion. But I had a hunch that a model can do it because it is even possible to distinguish them by eye, given that you know the limit of detection is at the lower end of measured values. At those low values (and for measurements where you expect 0, for example, measurements of blank samples that don’t emit any signal), all you are seeing is \epsilon. On the other hand, measurements far above this value should have only the experimental measurement error, i.e. if Y_{n}^{true} >> L, then only \sigma matters. In fact, people actually estimate \epsilon heuristically, by eye, using blank samples (I explain in my older post I liked).
So I coded both a model with one combined variance and one with two separate variances.
The model with a single variance looks something like this in stan
transformed parameters{
vector[N] Ytrue = ...; //nonlinear formula generating Ytrue
//estimating L on log-scale because it is often very small, 0.0001 etc.
vector[N] Z = Ytrue + exp(L);
}
model{
normal_lpdf(Ymeasured|Z,sigma);
}
In contrast, the model with two variances looks like this
parameters{
vector[N] Z_raw; //using non-centered parametrization
}
transformed parameters{
vector[N] Ytrue = ...; //nonlinear formula generating Ytrue
//again estimating L on log-scale because it is very small
vector[N] Z = (Ytrue + sigma*Z_raw) + exp(L);
}
model{
normal_lpdf(Ymeasured|Z,tau);
}
In addition to these two, since I have pretty strong assumptions that experimental errors are lognormal, whereas I’m not so sure about machine error, I have also made models with lognormal versions of the above.
The problem I’m encountering is:
- the two-variance model has better posterior predictions, but requires a lot more samples to properly converge and always has a low E-BFMI
- the one-variance model fits faster and achieves convergence easier, without any BFMI issues, but its posterior prediction is much poorer
Here is the two variance model’s energy plot and posterior prediction
And in contrast here is the one variance model’s energy plot and posterior prediction
Any ideas what could be happening? I wonder if I have coded the one variance model wrong (should I have a vector of L values instead of just a single L?). If it’s any help, when I perform LOO-CV on the two models, the two-variance model has a lot of very high pareto K values and the one variance model does fine, so I don’t think I can reliably compare the two… but when I do loo_compare
the two variance model has a far higher elpd
.
Any advice very much appreciated