Background
I’m unsure how to best model data from a widget manufacturing process with measurement “uncertainties” on categorical variables (relative to an ordered indexing variable) and an overall sparsity of measurements.
The widgets comprise several subcomponents that are put together on an assembly line. Subcomponents are produced elsewhere in batches, and the batches are fed into the assembly line more or less sequentially (i.e., as one batch of subcomponents is consumed, the next batch is added, but for some subcomponents there will be mixing of batches at this transition point).
The measurement “uncertainties” stem from the fact that we don’t know exactly which widgets contains which batch of each subcomponent - the best we can say is, e.g., that widgets 1 to ~100 contain subcomponent batch A, while ~101 to 200 contain batch B.
The sparse measurement aspect stems from the fact that dimensional measurements are made on all widgets by vision systems as they are assembled, but only every ~n^{th} widget is destructively tested for quality once the entire batch of widgets is completed. The test is simply pass/fail. A problem here is that it’s currently impossible to know exactly what the dimensional measurements are for a given tested widget because there is no one-to-one alignment of data - we just know approximately where the tested widget falls in the assembly sequence (i.e., the approximate index).
We know that widget quality can vary due to differences between subcomponent batches as well as on build order due to process drift. The dimensional variables measured on each widget are also known to impact quality.
Objective
The objective is to predict widget failure probability as a function of build order, subcomponent batches, and measurement data across the entire widget lot. My approach so far has been to model this as a Bayesian logistic mixed effects model in R using brms
:
failrate ~ s(BuildOrder) + s(DimensionA) + s(DimensionA) + (1|SubcomponentB) + (1|SubcomponentB)
Here I’m using a dataset that is just the size of the destructively-tested sample and assumes complete knowledge of the properties of each tested widget. My question is how to best model the uncertainty around where the tested widgets fall in the build order, and by extension estimate whether a widget contains subcomponent lot A or B, what its dimensional measurements are. How would one define priors around these uncertainties?