How to set up an interaction between ordinal variables

I’m currently writing up a model to study how clinicians use Structured Professional Judgement tools. SPJ tools are risk estimation instruments where clinicians measure a bunch of risk factors for a specific behavior and then give their subjective estimate of the likelihood of said behavior in the form of a categorical statement (in this case, either “low”, “moderate” or “high” risk of the behavior happening for patient i).
I’m trying to model the risk estimates as a function of the tool’s measurements.

The tool I’m interested in has 23 risk factors, which are measured as “clearly absent”, “possibly/partially present” or “clearly present”. Moreover, the clinician is asked to indicate how relevant he believes each risk factor to be for the individual evaluated. For any given evaluation, the clinician has to rate each risk factor as “low”, “moderate” or “high” relevance. I’m not sure how to handle the obvious interaction between presence and relevance scores for each risk factor.

In my opinion, the principled approach would be to model all presence and relevance scores as monotonic predictors and compute their interactions. Using brms, I realised the resulting model implies 69 effect coefficients and 92 simplexes per clinician. From what I understand, this is because the default way to model interactions between two monotonic predictors x_1 and x_2 in brms is :

\beta_1 * mo(x_1, \zeta_1) + \beta_2 *mo(x_2, \zeta_2) + \beta_3 * mo(x_1, \zeta_{31}) * mo(x_2, \zeta_{32})

I haven’t collected the data yet but, since I don’t expect to have that many evaluations per clinicians (maybe 20 or so), I’m guessing this will be hard to fit.

The first alternative I thought about is to assign arbitrary numbers to presence and relevance labels and codify each “risk factor” term as the product of presence and relevance scores. This made me realize that my choice of arbitrary numbers actually matters. For instance, assigning numbers 1 to 3 to both presence and relevance scores is not the same as assigning 0 to 2 because multiplying by 0 cancels some of the combinations. Keeping that in mind, my final plan was to assign numbers 0, 0.5 and 1 to presence scores and keep 1 to 3 for relevance scores. You can compute all possible combinations in R like this :

> presence <- c(0, .5, 1)
> relevance <- c(1:3)
> presence %*% t(relevance)
     [,1] [,2] [,3]
[1,]  0.0    0  0.0
[2,]  0.5    1  1.5
[3,]  1.0    2  3.0

This results in a model with 23 effect coefficients per clinician, which is much easier to interpret. Plus, this codification ensures that a risk factor will only affect risk estimates if it is present. Still, interpretation is awkward because the model makes weird assumptions like “possibly/partially present” + “moderate relevance” < “clearly present” + “relevance high”.

That’s pretty much where I’m stuck. Let me know if you have ideas on how I could improve this model.

1 Like

Assuming you wish to make inferences about clinicians in general rather than your particular in-sample clinicians, you need to write the model in some way that (partially) pools the effects across clinicians. 69 coefficients per clinician might not be so bad if they are partially (or even completely) pooled across clinicians.

A separate question is whether it’s also appropriate to (partially) pool the effects across risk factors.

Unless you want complete pooling, this is going to be hard to fit with brms. In brms, putting a random effect over a monotonic effect has the effect of estimating one fully pooled shape for the monotonic effect with random variation in the magnitude of the difference between the highest and lowest category (and therefore between any pair of categories). It seems like you instead want to allow the shape of the monotonic effect to vary, across clinicians, but again I suspect with some partial pooling.

If you’re interested in coming up with a way to do this in raw Stan outside of brms, then I’d be happy to have a think about how to do that, though I’m not too sure what a good solution would be.

A different option is that it’s possible that with robust partial pooling across clinicians and questions, you will end up in a setting that is so data-rich that you don’t need an explicit monotonicity constraint to enforce monotonicity; the data might do it for you via the likelihood. So you could experiment with something like score ~ presence * relevance + (1 + presence * relevance | clinician:factor) where presence and relevance are treated as non-ordinal factors and see if you get sensibly monotonic output. You could also add some more structure with something like score ~ presence * relevance + (1 + presence * relevance | clinician) + (1 + presence * relevance | factor) + (1 + presence * relevance | clinician:factor)

5 Likes

Thank you for the help!

I have some thoughts but first, I think simulated data would be helpful so here’s some R code :

set.seed(1234)
n <- 60
evaluator_id <- c(rep(1, 20), rep(2, 15), rep(3, 25))
patient_id <- 1:100

# Random data for the predictors
x_labels <- c(
    paste0("pr_rf", 1:3),
    paste0("rel_rf", 1:3)
)

x <- sapply(x_labels, function(i) {
    rcat(n, c(1, 1, 1))
}) 

# Random data for the outcome (assuming no effect of the predictors for now)
risk_estimates <- rcat(n, c(1, 1, 1))

data <- data.frame(
    evaluator_id,
    patient_id,
    risk_estimates,
    x
)

Assuming you wish to make inferences about clinicians in general rather than your particular in-sample clinicians, you need to write the model in some way that (partially) pools the effects across clinicians.

The goal is both to compare effects across clincians and to estimate pooled effects for each risk factor in order to compare how important clinicians consider specific risk factors in general.

A separate question is whether it’s also appropriate to (partially) pool the effects across risk factors.

I briefly considered pooling effects across risk factors as well but, in that case, I don’t think it makes theoretical sense to do so. Hence, partial pooling across clinicians is what I’m going for.

In brms , putting a random effect over a monotonic effect has the effect of estimating one fully pooled shape for the monotonic effect with random variation in the magnitude of the difference between the highest and lowest category (and therefore between any pair of categories).

I’m not sure I understand what you mean. I thought I could do partial pooling for all parameters in brms using the following syntax :

formula <- risk_estimates ~ mo(pr_rf1) * mo(rel_rf1) + mo(pr_rf2) * mo(rel_rf2) + mo(pr_rf3) * mo(rel_rf3) + 
    (mo(pr_rf1) * mo(rel_rf1) + mo(pr_rf2) * mo(rel_rf2) + mo(pr_rf3) * mo(rel_rf3) | evaluator_id)

fit_brms <- brm(
    formula = formula,
    data = data,
    family = cumulative(link = "logit")
)

Is that not the case?

If you’re interested in coming up with a way to do this in raw Stan outside of brms , then I’d be happy to have a think about how to do that, though I’m not too sure what a good solution would be.

I’m 100% interested in writing my own Stan code for this. I’ve been experimenting with brms and rstanarm and I came to the conclusion that I can’t have everything I want in one interface. For instance, rstanarm allows me to specify dirichlet induced priors for the cutpoints and to express prior effects based on expected R2 – two nice features I don’t know how to implement in brms. rstanarm is also quite a bit faster, which is nice. However, as far as I know, the stan_polr function does not allow for partial-pooling and monotonic effects.

I have (almost) no experience in Stan so my strategy was to try figuring out Stan code generated by brms and rstanarm. This vignette by @betanalpha seems like a good starting point but I haven’t written anything yet so I’ll keep poking at it.

Double-check me in case I’m wrong, but I am pretty sure that (mo(x) | group) gives one estimated simplex which then gets scaled by a different total effect size for each group, with partial pooling across groups. This seems not to be what you want.

You’re right. This is actually mentioned in this paper by @paul.buerkner and @Emmanuel_Charpentier (see page 430), the goal being to maximize the stability of the estimates and reduce convergence problems when sample size is small across levels. For this specific model however, I think it’s still justified (at least on a theoretical level) to condition shape parameters on clinicians. For example, two clinicians might treat a “possibly/partially present” risk factor differently on average. Now that I think about it, there could be an argument to partially/completely pool shape parameters of presence scores across risk factors.

Testing it out with brms, I noticed you don’t get an estimate of the implicit constant simplexe using the “no pooling” formula : y ~ (mo(x) | group). To get an estimate of the shape parameter, you need to explicitly specify a population level monotonic predictor, like so : y ~ mo(x) + (mo(x) | group). This is a weird behavior in my opinion. Looking at the Stan code generated by brms, I think the group-level shape parameter is simply ignored in the “fixed effect” model. I could be missing something though.

I think I figured out how to model the interaction. I’ve never seen an interaction modelled that way but it makes sense in the context of the problem. Let me know if the math works out.

Formally, I wish to model the interaction between a single pair of risk factor measures (presence and relevance scores) as

\phi_i = \beta \sum_{n=1}^{x_i} \delta_n (1+ \lambda\sum_{n=1}^{w_i} \zeta_n)

where x_i \in \{1, 2, 3 \} is the ordinal presence score, w_i \in \{1, 2, 3 \} is the ordinal relevance score, \beta \in \mathbb{R} is the effect of the risk factor being present, \lambda \in \mathbb{R^+} is the moderating coefficient of relevance (high values means a bigger gap between low relevance and high relevance), and \delta and \zeta are shape parameters (i.e. simplexes) for presence and relevance scores respectively (we set \delta_1 = \zeta_1 = 0 for convenience). If visualization is more your thing, I made this little shiny app to test out how different values for each parameters affect the overall effect of the interaction term.

I believe I got this to work in Stan with the following code :

functions {
  real induced_dirichlet_lpdf(vector kappa, vector alpha, real phi) {
    int K = num_elements(kappa) + 1;
    vector[K - 1] sigma = inv_logit(phi - kappa);
    vector[K] p;
    matrix[K, K] J = rep_matrix(0, K, K);
    
    // Induced ordinal probabilities
    p[1] = 1 - sigma[1];
    for (k in 2:(K - 1))
      p[k] = sigma[k - 1] - sigma[k];
    p[K] = sigma[K - 1];
    
    // Baseline column of Jacobian
    for (k in 1:K) J[k, 1] = 1;
    
    // Diagonal entries of Jacobian
    for (k in 2:K) {
      real rho = sigma[k - 1] * (1 - sigma[k - 1]);
      J[k, k] = - rho;
      J[k - 1, k] = rho;
    }
    
    return   dirichlet_lpdf(p | alpha)
           + log_determinant(J);
  }
}

data {
    int<lower=0> N;                  // Number of observations
    int<lower=1, upper=3> y[N];      // Observed ordinal outcome (risk estimates)
    int<lower = 1, upper = 3> x[N];  // Observed ordinal predictor (presence of risk factors)
    int<lower = 1, upper = 3> w[N];  // Observed ordinal moderator (relevance of risk factors)
    int<lower = 3> K_y;              // Number of ordinal categories of the outcome
    int<lower = 3> K_x;              // Number of ordinal categories of the predictor
    int<lower = 3> K_w;              // Number of ordinal categories of the moderator
}

parameters {
    ordered[K_y - 1] kappa;   // (Internal) cut points for the outcome
    real beta;                // Latent effect of the predictor
    real<lower=0> lambda;     // Latent effect of the moderator
    simplex[K_x - 1] delta;   // Normalized distances across categories of the predictor
    simplex[K_w - 1] zeta;    // Normalized distances across categories of the moderator
}

transformed parameters {
    vector[N] phi;
    for (i in 1:N) {
        phi[i] = beta * cumulative_sum(append_row(0, delta))[x[i]] * (1 + log(lambda) * cumulative_sum(append_row(0, delta))[w[i]]);
    }
}

model {
    
    // Prior model
    kappa ~ induced_dirichlet(rep_vector(1, K_y), 0);
    beta ~ normal(0, 1);
    lambda ~ normal(0, 1);
    delta ~ dirichlet(rep_vector(1, K_x - 1));
    zeta ~ dirichlet(rep_vector(1, K_w - 1));
    
    // Observed model
    y ~ ordered_logistic(phi, kappa);
}

There are a few key things missing from this code.

  1. Generalize to multiple risk factors. This is for one risk factor (i.e. one presence score and one relevance score) whereas the tool I’m interested in has 23. I can of course enter each variable manually but if someone knows how to generalize this code efficiently, that’d be a huge help!
  2. Partially pool effects across clinicians. Theoretically speaking, all 4 parameters of this model should vary from one clinician to the other and I’d like to model that. I don’t know how realistic it is considering potential convergence issues but I’m sure it’s at least possible.
  3. Prior selection??? . Not only is this complicated because R2 is already hard to interpret in the context of ordinal regression but, with the inclusion of the interaction term, I can hardly get a sense of what a reasonable prior would be for \beta and \lambda. At the very least, running the code above on simulated data didn’t result in any convergence issues as far as I can tell. I’m open to suggestions!