I’m working on a model where the outcome is a categorical trait with 11 possible states (colors). The observed color may be a misclassification of the true latent state, with errors more likely between “neighboring” colors (so the probability of misclassification should decay with distance) along this fixed order:
Black–Gray–Blue–Brown–Pink–Orange–Red–Yellow–Green–White–Purple (this ordering is based on the most and least commonly confused colors between two observers).
So, I want to create a custom likelihood/family distribution in Stan (and make it usable from brms) that accounts for this misclassification and build models where different predictors give probabilities for each latent (true) category.
I’m pretty new to Stan, and I don’t know if this is even possible to do, but any ideas, suggestions, or pointers would be very helpful.
Do you know the misclassification probabilities a priori or do you want to estimate them? If you want to estimate them, we need to understand a bit more about the structure of the data.
Is there one observer or many?
Are misclassification probabilities fixed across observers or variable?
Do you ever have multiple observers looking at the same color swatch and independently classifying it? Or one observer looking at the same swatch repeatedly but independently, and (re)classifying it?
We also need to understand a bit more about the theoretical expectations:
Do pairs of neighboring colors (or colors separated by a given distance) always have the same misclassification probabilities, or do the probabilities depend on the particular colors?
Is black less likely to be misclassified than gray, since gray has neighbors in both directions, but black only has neighbors in one direction?
I don’t know the misclassification probabilities a priori. Because of the data volume (~11,000 illustrations), two observers (myself and another person) classified the colors independently, not on the same illustrations. To get a sense of how our classifications differed, each observer reclassified 100 random samples from the other’s dataset. I used that cross-classification to carry out an agreement analysis, which showed which colors were most commonly confused; that’s where the ordering comes from (e.g., Black–Gray/Gray–Black was the most frequent confusion).
So to answer your questions:
There are two observers.
I would assume that misclassification probabilities could vary across observers, but my main goal is to capture a shared misclassification process for modeling.
The cross-classification was done to evaluate differences in how the two of us recorded/perceived colors. From that agreement analysis, I identified which color pairs were most often confused, and that informed the fixed ordering I’m using in the model.
For modeling:
I want to estimate a single decay parameter (say λ) that controls how quickly misclassification probabilities drop off as categories get further apart in the fixed order.
I’m fine treating all neighbors as equally confusable if they are the same distance apart.
I’m not considering pair-specific misclassification rates (e.g., Black–Gray vs Gray–White), as that would probably explode the number of parameters.
Also, edge categories (like Black) only having one neighbor is fine; the exponential decay should handle that naturally.
The data you’ve collected are unlikely to carry much information about observer-specific classification probabilities. The question to ponder is what signal you’d expect to see in the data if observer A is nearly always right (so disagreements involve misclassification from observer B), versus is observer B is nearly always right, versus if both observers are wrong with similar frequency. It is possible that the data do carry a teeny bit of information about this, particularly if colors that are “adjacent” from a misclassification perspective are actually far apart in terms of typical covariate values.
Alternatively, you may be comfortable making a strong assumption that simplifies the problem–for example, assuming that both observers make misclassifications in both directions with equal probability. This is a very strong assumption, and before making it I would check in your data that for any pair of colors X, Y, a disagreement where observer A says X and observer B says Y is approximately as common as a disagreement where observer A says Y and observer B says X. An appropriate test would be to look at all instances of disagreement about the pair X, Y, and to treat the number of times observer A says X as a binomial random variable. You’re looking for evidence that the binomial proportion is different from 0.5. Likewise, you might want to check that both directions are equally probable by checking that for every pair of adjacent colors X, Y, the frequency of disagreement about X, Y is approximately the same as the frequency of disagreement about X, Z, where Z is the color “on the other side of X” from Y, and accounting for the potentially different frequencies of Y and Z in the data.
I think it should be possible to write down a likelihood under either of the assumptions mentioned above, but I think the first paragraph describes a case where it is very likely to be impossible to get meaningful results from your model. If you’d like to think about the likelihood corresponding to the second paragraph above, I can try to help you think it through, though I don’t promise I can consistently respond in a timely manner. The next question to answer will be–what assumptions do you want to make about the shape of the decay in misclassification probability with “distance”? Monotonically decreasing, but what else? Would you be willing to assume a particular functional form for the decay?
For the directional symmetry**:** For 3 out of 6 pairs of colors (excluding the edge colors), disagreements are approximately symmetric between observers (binomial tests not significantly different from 0.5), and the remaining 3 pairs (e.g., Black–Brown) are asymmetric.
For equal probability for both directions**:** Some focal colors (e.g., Pink, Orange) are balanced between their left and right neighbors (i.e, equal probabilities), but others are strongly one-sided (e.g., Blue, which was only confused with Gray and Red, which was only confused with Orange).
Side note: Maybe I need to rethink the color scale? Just thinking about that…but I haven’t figured out a way to order those colors naturally in terms of “closeness”.
So, the assumption of perfectly symmetric misclassification doesn’t hold everywhere, but it seems like a workable approximation for half of the color pairs. Given the structure of my data, I think it makes sense to model a shared, symmetric misclassification process with a single decay parameter, while acknowledging that this simplification overlooks the observed asymmetries.
For the decay function, I’m inclined to assume a monotone exponential kernel normalized over categories. That seems like the simplest functional form, and I can consider Gaussian decay as a robustness check.