Help implementing Plackett-Luce model in Stan with ties, ranked predictors, and rater covariates

Hi all,

I’m working on implementing a Plackett-Luce model in Stan and have run into a few issues, particularly around modeling ties in the rankings and including multiple ranked predictors and covariates for raters and participants.

I’ve used the following resources to get me this far:

What I’m trying to do:

  • Each rater ranks a subset of participants on a set of traits (e.g., predictor_trait1, predictor_trait2, …, trait_DV).
  • The outcome is a ranking (trait_DV), and I want to model it as a function of rankings on other traits (predictor_trait1, etc.).
  • I also have rater-level covariates (e.g., rater_gender) and participant-level covariates (e.g., participant_gender) that I’d like to include as fixed effects, and optionally their interaction.
  • Some raters tied participants on some traits, meaning they assigned the same rank to multiple participants.

Data format:

The data is currently in long format like this:

rater_id participant_id trait rank rater_gender participant_gender
r1 p1 predictor_trait1 1 male female
r1 p2 predictor_trait1 2 male male
r1 p3 predictor_trait1 2 male female
r1 p1 trait_DV 1 male female
# toy data
data <- tibble::tibble(
  participant_id = c(101, 102, 103, 104, 101, 102, 103, 104),
  trait          = c("predictor_trait1_IV", "predictor_trait1_IV", "predictor_trait1_IV", "predictor_trait1_IV",
                     "trait_DV", "trait_DV", "trait_DV", "trait_DV"),
  rater_id       = c(1, 1, 1, 1, 1, 1, 1, 1),
  rater_group    = c(1, 1, 1, 1, 1, 1, 1, 1),
  participant_group = c(0, 0, 1, 1, 0, 0, 1, 1),
  rank           = c(1, 2, 2, 4, 1, 3, 2, 4)  # ties allowed
)

What I’m not sure about:

  1. Ties: Some raters assign the same rank to multiple participants. Can the Plackett-Luce model in Stan handle this natively? If not, is there a common workaround?
  2. Predictors: I want to model trait_DV rankings as a function of the other ranked traits — all on the ranking scale. Does multiplying each trait’s rank by a corresponding beta coefficient make sense, or is there a better way to structure that part of the model?
  3. General model structure: Does the following Stan structure make conceptual sense? I’m currently modeling the outcome as a categorical rank using trait-based rankings as predictors. I’d like to know if this approach reasonably approximates a Plackett-Luce model or if I should move toward a full ranking-based likelihood with support for ties.

Example model:

data {
  int<lower=1> n_obs;
  int<lower=1> n_traits;
  int<lower=1> n_raters;

  int<lower=1> trait_DV_rank[n_obs];  // outcome: ranking of DV trait
  real rank[n_obs];                   // predictors: ranks of other traits
  int<lower=1> trait_id[n_obs];       // which predictor trait this row is
  int<lower=1> rater_id[n_obs];       // rater identity
  int<lower=0, upper=1> rater_gender[n_obs];
  int<lower=0, upper=1> participant_gender[n_obs];
}

parameters {
  real alpha;
  vector[n_traits] beta;
  real beta_rater_gender;
  real beta_participant_gender;
  real beta_interaction;

  vector[n_raters] rater_intercepts;
  real<lower=0> sigma_rater;
}

model {
  vector[n_obs] eta;

  beta ~ normal(0, 1);
  alpha ~ normal(0, 1);
  beta_rater_gender ~ normal(0, 1);
  beta_participant_gender ~ normal(0, 1);
  beta_interaction ~ normal(0, 1);
  rater_intercepts ~ normal(0, sigma_rater);
  sigma_rater ~ cauchy(0, 1);

  for (i in 1:n_obs) {
    eta[i] = alpha +
             beta[trait_id[i]] * rank[i] +
             beta_rater_gender * rater_gender[i] +
             beta_participant_gender * participant_gender[i] +
             beta_interaction * rater_gender[i] * participant_gender[i] +
             rater_intercepts[rater_id[i]];
  }

  trait_DV_rank ~ categorical_logit(eta);  // Likelihood
}

I think it makes sense when thinking about Plackett-Luce to back up and think about Bradley-Terry. If you can figure out what to do for ties there, then it should be easy to promote that from pairwise to K-wise rankings. There’s a literature on how to do this that I haven’t read.

One way to think about ties is to treat them as possibly coming out either way. So if you have [a = b, c], then this could be [a, b, c] or [b, a, c].

I don’t know how you could model an interaction between item-level (i.e., participant trait level) and rater-level covariates.

You mean the rank among the other participants? I would think that you would instead try to create a generative model for each item.

P.S. Weird looking at my face in a post!

1 Like