I have coded up a Stan model related to a problem in industrial engineering. The procedure in question is referred to in the field as “Gauge R & R” (Repeatability & Reproducibility). The goal is to estimate the variance components associated with making acceptance measurements. So for example, one might imagine test operators measuring gasket thicknesses with calipers. We would like the measurement variability to be due mostly to the actual part variability, and to minimize the variability resulting from repeated measurements by the same operator on the same part (repeatability), and variability due to different operators inducing varying amounts of bias due to how they operate the equipment (reproducibility). A typical Gauge R & R study will involve several operators selected at random and several parts selected at random, with each operator measuring each part several times.
The model I coded is based largely on WinBugs code from example 2 (Two-Way Random-Effects Model) of a paper entitled “A Bayesian Approach to the Analysis of Gauge R&R Data” (Weaver, B. P., Hamada, M. S., Vardeman, S. B., Wilson, A. G. (2012). Quality Engineering, 24:486–500). The variance components to be estimated are sigmaRep (repeatability), sigmaO (operator variability), and sigmaP (part variability). To keep things simple I have excluded the part/operator interaction component included in the paper.
My Stan code is included below:
gaugeRR_model.txt (1.5 KB)
Typically the model seems to behave fairly well: it recovers the parameters I used to generate simulated data; Rhat values are 1; the trace plots and marginal posterior histograms look good. N_Eff values are decent I guess, typically 10 – 50% of the total number of samples. Time to generate 20,000 samples (4 chains, 5000 iterations) is about 15 seconds.
Where the model struggles, is when trying to fit to data generated using variance components that differ significantly from one another. For example, fit to data (dataset1) using sigmaRep = 0.1, sigmaO = 1, and sigmaP = 10, resulted in Rhat still 1, but N_Eff as low as 7%, and sampling time 50 seconds.
dataset1.txt (741 Bytes)
Taking it a step further (dataset2), if we try sigmaRep = 0.02, sigmaO = 1, and sigmaP = 50, we get Rhat as high as 1.4, N_Eff as low as 0.075%, and sampling took 70 seconds.
dataset2.txt (741 Bytes)
So I was hoping someone could explain why the model behaves this way and if anything can be done about it.
BTW, in the paper cited above, in order to obtain decent results with WinBugs, the authors ran 5 million iterations, dropped the first 200,000 as burn-in, and thinned the rest by a factor of 100. So Stan is already doing much better.