I don’t know Bradley-Terry models nor am I an expert on what I’m about to suggest :D. You could formulate the problem as a longitudinal HMM with multiple series. Apparently @martinmodrak may have worked on a model like this (from Examples for hidden Markov models with longitudinal data).
The forward algorithm would cut down on the number of combinations and you only have 2 states (ok and problem). If they’re in the “ok” state skill states the same, if they are in the “problem” state you decrease their skill. You could even increase the probability of transitioning into and staying in the “problem” state as a function of skill.