I’m currently writing up a model to study how clinicians use Structured Professional Judgement tools. SPJ tools are risk estimation instruments where clinicians measure a bunch of risk factors for a specific behavior and then give their subjective estimate of the likelihood of said behavior in the form of a categorical statement (in this case, either “low”, “moderate” or “high” risk of the behavior happening for patient i).
I’m trying to model the risk estimates as a function of the tool’s measurements.
The tool I’m interested in has 23 risk factors, which are measured as “clearly absent”, “possibly/partially present” or “clearly present”. Moreover, the clinician is asked to indicate how relevant he believes each risk factor to be for the individual evaluated. For any given evaluation, the clinician has to rate each risk factor as “low”, “moderate” or “high” relevance. I’m not sure how to handle the obvious interaction between presence and relevance scores for each risk factor.
In my opinion, the principled approach would be to model all presence and relevance scores as monotonic predictors and compute their interactions. Using brms, I realised the resulting model implies 69 effect coefficients and 92 simplexes per clinician. From what I understand, this is because the default way to model interactions between two monotonic predictors x_1 and x_2 in brms is :
I haven’t collected the data yet but, since I don’t expect to have that many evaluations per clinicians (maybe 20 or so), I’m guessing this will be hard to fit.
The first alternative I thought about is to assign arbitrary numbers to presence and relevance labels and codify each “risk factor” term as the product of presence and relevance scores. This made me realize that my choice of arbitrary numbers actually matters. For instance, assigning numbers 1 to 3 to both presence and relevance scores is not the same as assigning 0 to 2 because multiplying by 0 cancels some of the combinations. Keeping that in mind, my final plan was to assign numbers 0, 0.5 and 1 to presence scores and keep 1 to 3 for relevance scores. You can compute all possible combinations in R like this :
> presence <- c(0, .5, 1)
> relevance <- c(1:3)
> presence %*% t(relevance)
[,1] [,2] [,3]
[1,] 0.0 0 0.0
[2,] 0.5 1 1.5
[3,] 1.0 2 3.0
This results in a model with 23 effect coefficients per clinician, which is much easier to interpret. Plus, this codification ensures that a risk factor will only affect risk estimates if it is present. Still, interpretation is awkward because the model makes weird assumptions like “possibly/partially present” + “moderate relevance” < “clearly present” + “relevance high”.
That’s pretty much where I’m stuck. Let me know if you have ideas on how I could improve this model.