Hi all,
I’m working on implementing a Plackett-Luce model in Stan and have run into a few issues, particularly around modeling ties in the rankings and including multiple ranked predictors and covariates for raters and participants.
I’ve used the following resources to get me this far:
What I’m trying to do:
- Each rater ranks a subset of participants on a set of traits (e.g.,
predictor_trait1
,predictor_trait2
, …,trait_DV
). - The outcome is a ranking (
trait_DV
), and I want to model it as a function of rankings on other traits (predictor_trait1
, etc.). - I also have rater-level covariates (e.g.,
rater_gender
) and participant-level covariates (e.g.,participant_gender
) that I’d like to include as fixed effects, and optionally their interaction. - Some raters tied participants on some traits, meaning they assigned the same rank to multiple participants.
Data format:
The data is currently in long format like this:
rater_id | participant_id | trait | rank | rater_gender | participant_gender |
---|---|---|---|---|---|
r1 | p1 | predictor_trait1 | 1 | male | female |
r1 | p2 | predictor_trait1 | 2 | male | male |
r1 | p3 | predictor_trait1 | 2 | male | female |
r1 | p1 | trait_DV | 1 | male | female |
… | … | … | … | … | … |
# toy data
data <- tibble::tibble(
participant_id = c(101, 102, 103, 104, 101, 102, 103, 104),
trait = c("predictor_trait1_IV", "predictor_trait1_IV", "predictor_trait1_IV", "predictor_trait1_IV",
"trait_DV", "trait_DV", "trait_DV", "trait_DV"),
rater_id = c(1, 1, 1, 1, 1, 1, 1, 1),
rater_group = c(1, 1, 1, 1, 1, 1, 1, 1),
participant_group = c(0, 0, 1, 1, 0, 0, 1, 1),
rank = c(1, 2, 2, 4, 1, 3, 2, 4) # ties allowed
)
What I’m not sure about:
- Ties: Some raters assign the same rank to multiple participants. Can the Plackett-Luce model in Stan handle this natively? If not, is there a common workaround?
- Predictors: I want to model
trait_DV
rankings as a function of the other ranked traits — all on the ranking scale. Does multiplying each trait’s rank by a corresponding beta coefficient make sense, or is there a better way to structure that part of the model? - General model structure: Does the following Stan structure make conceptual sense? I’m currently modeling the outcome as a categorical rank using trait-based rankings as predictors. I’d like to know if this approach reasonably approximates a Plackett-Luce model or if I should move toward a full ranking-based likelihood with support for ties.
Example model:
data {
int<lower=1> n_obs;
int<lower=1> n_traits;
int<lower=1> n_raters;
int<lower=1> trait_DV_rank[n_obs]; // outcome: ranking of DV trait
real rank[n_obs]; // predictors: ranks of other traits
int<lower=1> trait_id[n_obs]; // which predictor trait this row is
int<lower=1> rater_id[n_obs]; // rater identity
int<lower=0, upper=1> rater_gender[n_obs];
int<lower=0, upper=1> participant_gender[n_obs];
}
parameters {
real alpha;
vector[n_traits] beta;
real beta_rater_gender;
real beta_participant_gender;
real beta_interaction;
vector[n_raters] rater_intercepts;
real<lower=0> sigma_rater;
}
model {
vector[n_obs] eta;
beta ~ normal(0, 1);
alpha ~ normal(0, 1);
beta_rater_gender ~ normal(0, 1);
beta_participant_gender ~ normal(0, 1);
beta_interaction ~ normal(0, 1);
rater_intercepts ~ normal(0, sigma_rater);
sigma_rater ~ cauchy(0, 1);
for (i in 1:n_obs) {
eta[i] = alpha +
beta[trait_id[i]] * rank[i] +
beta_rater_gender * rater_gender[i] +
beta_participant_gender * participant_gender[i] +
beta_interaction * rater_gender[i] * participant_gender[i] +
rater_intercepts[rater_id[i]];
}
trait_DV_rank ~ categorical_logit(eta); // Likelihood
}