Posterior predictive for hierarchical with measurement error model


I have a very simple hierarchical with measurement error model and struggle how to perform posterior predictive simulations and graphs in R as described in BDA3 section “Replicated data in the existing schools”. The setting is analytical lab - we have L lots of tablets, from each lot we measure N tablets. The inter-lot sd is sigmaB, intra-lot sd is sigmaW, analytical sd is sigmaA, Y_true is true value (amount of API), Y is observed value. The distilled version of model is below:
for (l in 1:L) mu[l] ~ normal(processMean, sigmaB)
for (n in 1:N) Y_true[n] ~ normal(mu[lot[n]],sigmaW)
for (n in 1:N) Y[n] ~ normal(Y_true[n], sigmaA)

If i were to use analogy with replicated data then Y_rep~normal_rng(Y_true,sigma) may not be very informative. I wonder if there are other ways to perform posterior predictive check.

Please note that I have to set informative prior for sigmaA in order to identify it from sigmaW. It is completely another story - if there are any HPLC folks I would like to solicit the analytical error estimate for CU. If there are any fiber optics folks I would like to solicit analytical error estimate for percent dissolved to be used as informative prior for sigmaA.

Thanks a lot for any insight,

If you want to generate new Y_new from the posterior predictive, p(Y_new | Y), then you need to replicate the measurement process:

for (n in 1:N_new) {
  Y_true_new[n] = normal_rng(mu[lot_new[n]], sigmaW);
  Y_new[n] = normal_rng(Y_true_new[n], sigmaA);

If you don’t use the RNGs, you’ll underestimate the posterior uncertainty in Y_new[n] and defeat the whole purpose of the posterior predictive checks.

P.S. What’s HPLC and CU?

Thanks a lot. I already tried to do something like this in R but got stuck
with graphical displays - how to visualize and prove to FDA that model is
good. By the way, HPLC is high performance liquid chromatography to measure
amount of drug in the tablet and CU is content uniformity test. Since we
are not completely follow pharmacopeia it probably should be called
differently (like measuring amount of API in the tablet).

In R I generated L_new lots with N_new tablets per lot but failed to find a
good chart to demonstrate power of predictive posterior. The difference
between Y_true_new and Y_new is vanishing if analytical error is small.
p-value was zero. Perhaps I should not have generated new lots. The code in
R was:
for (l in 1:L_new)
lot_new[l] = rnorm(N_new,mean=processMean,sd=sigmaB);

for (n in 1:N_new) {
Y_true_new[n] = normal_rng(mu[lot_new[n]], sigmaW);
Y_new[n] = normal_rng(Y_true_new[n], sigmaA);

Thank you,

Thanks—should’ve guessed you were the same Linas! Somehow, when I saw “tablet”, I thought iPad, not drug.

Yes, that’s the right way to generate predictions for new lots. If you want to generate predictions for new elements of existing lots, you’d use the lot mean and sd.


I am glad that I am moving in the right direction. Could you please tell
your opinion if the charts below are good enough to demonstrate how
predictive posterior may be used to determine the goodness of model fit.

There are 8 lots and 10 tablets from each lot are observed. Black circles
indicate actual measurement, green circles indicate mean of Y_rep, red
lines indicate 95% CI for each tablet in Y_rep. Obviously those red lines
overlap. I see two things - 95% CI covers actual data which is good but
green circles are distributed more tightly than actual data which I don’t
understand why.

The next chart shows individual tablet and corresponding 95% CI of Y_rep.
The stan model is attached.

As a side note, there is one divergence in 3 chains with 5000 iterations,
2000 warmups, adapt_delta = 0.999, stepsize = 0.001, max_treedepth = 20 so
I guess there are some numerical problems going on.

Thanks in advance for guidance,

weightIG3.stan (2.08 KB)


Maybe this one below is a better demonstration of predictive posterior?

Perhaps just optimizing the model to remove divergent transitions remains.

Many thanks for your help.

See the Cook, Gelman, and Rubin paper. 95% intervals are both hard to esimate (each end only uses a few % of the draws) and also hard to use to test calibration, because you don’t expect many values to be outside the posterior. Better to use 50% intervals where you expect binomial(N, 0.5) true values to be within their posteriors.