Order of one pair flipped using Thurstonian model to estimate latent ordering of job candidate ranking data - how to fix?

I’m trying to build a Stan model to analyse the rankings produced by recruitment panels who are reviewing different types of CVs. I want to look at the effect of CV format - but first I am trying to get a simple model that takes the ranking data from panel members and estimates the latent ranking of the candidates. The overall research project (Action Research on Research Culture - Bennett Institute for Public Policy) is looking at scientific recruitment and whether changing the format of CVs (eg Résumé for Research and Innovation (R4RI) guidance – UKRI) affects the shortlisting.

I have based the code on Thurstonian model post and have looked at the other posts on Thurstonian modelling (I have added more comments to help me understand the code - or show where I haven’t understood it…)

The model nearly works but for most data sets two of the candidates are flipped in order - there are not necessarily candidates who are next to each other in the ranking - eg the 7th and 10th candidate might be flipped, so I don’t think the flipping is because of error in the estimation. Different pairs of candidates are flipped depending on the seed used for the randomly assigned abilities of the candidates.

I would welcome suggestions on how to improve the model to make it return the correct ordering of candidates using the dummy data. This thanks for your expertise.


The code is split into three chunks - because it is in a Quarto document - I explain the chunks below, and at the end include on R script that includes all the code.

Below is the R code to generate the synthetic data - for 10 candidates (labelled a-j) assessed by 4 recruitment panel members (labelled w-z). It can add measurement error for each panel member (panel_sc_sd) but currently that is set to zero - so all the recruitment panel supply the same ranking and I am simply trying to get back that ranking.

The ranking is slightly counter intuitive (when thinking about ranking job candidates) in that the lower the ability score the better/lower the rank - but as this is the default sorting for the mathematical functions I have used approach to avoid confusion.

pacman::p_load("tidyverse", "rstan", "reprex") # Package loading here to make it a reproducible example
set.seed(573) # Standard seed is 573
n <- 10    # Number of candidates
p <- 4    # Number of recruiting panel members 
candidate_details <- tibble(name = letters[1:n],     # Generate letter names for each candidate taken from start of alphabet
                         sc_ability = runif(n, min = 0, max = 10),    # Generate scientific ability for each candidate by random sampling
                         sc_rank = rank(sc_ability),    # Find the rank of each candidate, use negative as we want highest ability to be first
                         sc_order = order(sc_ability))   # Find the order in which the candidates have to be selected to put the highest ability first, ie in the first row is the index of the candidate with highest ability

panel_sc_sd <- 0.0 # Standard deviation of recruiters estimate of sc_ability

panel_names <- letters[(27-p):26] # Generate letter names for each member of the recruiting panel taken from end of alphabet
# Define function to produce panel estimates of scores
find_member_scores <- function(sc_ability){
  sc_ability + rnorm(length(sc_ability), mean = 0, sd = panel_sc_sd)
}
# Add panel member latent scores
for(panel_member in seq_along(panel_names)){ # * TO DO * It would be great to replace this with Purrr calls
  panelist <- paste0("p_sc_", panel_names[panel_member])
  candidate_details <- candidate_details %>% 
    mutate({{panelist}} := find_member_scores(sc_ability))
}
# Work out how each panel member ranks the candidates
candidate_details <- candidate_details %>% 
  mutate(across(starts_with("p_sc"), ~ dense_rank(.), .names = "rank_{.col}"))

The Stan model based on Thurstonian models in the Stan forums is:

//
// This Stan program is the simpler working model from this thread of discussion: https://discourse.mc-stan.org/t/thurstonian-model/1735
// It does not allow sigma to vary between raters

data {
  int<lower=0> K; // number of items being ranked
  int<lower=0> J; // number of raters
  int<lower=1,upper=K> y[J,K]; // *1* y[i] - ie a row of the matrix is a vector of the ranks given to the candidates by rater J QU or is it an array?
  
  real<lower = 0> mu_sd_prior; // the standard deviation of the prior for the mu values ¿of raters?
  
}

transformed data{
  
  int y_argsort[J, K]; // a matrix to contain indices to sort the J rows into ascending order, so item 1 is '2' if the the 2nd item in y in row J is the 1st
  
  for (i in 1:J){ 
    y_argsort[i] = sort_indices_asc(y[i]); // apply the function 'sort_indices_asc' to sort ascending indices as detailed in line *1*
  }
  print("sorted arguments:", y_argsort);
}

parameters {
  vector[K-1] mu_hat; // these are the deduced values of mu for each item used in the construction of the mu vector below, the construction is necessary to identify the model by ensuring all the mu's sum to 0.
  //real<lower=0> sigma[J]; // This is the prior for allowing sigma to vary by rater.
  ordered[K] z[J]; // I think this is an ?array? of J ordered vectors of length K ie the set of rankings produced by the raters called z. This stores the deduced values that rater J sampled for applicant K
                   //                     ------K-----
                   //                    | z z z z z 
                   //                    J z z z z z
                   //                    | z z z z z
                   // using these values we then deduce the values of mu (the latent score of each applicant) that gave rise to the values in z
}

transformed parameters{ // this builds a vector of K values of mu with the lowest value fixed to 0 to identify the model (as the mu's can be translated along the number line without changing the model)
  vector[K] mu;
  mu = append_row(mu_hat, -sum(mu_hat)); // Initial thoughts on adding a centring constraint - although this doesn't constrain the scale, although neither does the previous approach
}

model {
  
  mu_hat ~ normal(0, mu_sd_prior); // vectorised assignment into mu-hat and therefore also mu? Adrian's model uses a Beta distribution, does this force the model to be identified as it limits mu's to 0-1
  //sigma ~ cauchy(0, 1);
  for (j in 1:J){                  // for each rater
    z[j] ~ normal(mu[y_argsort[j]], 1); // The intention was for this to be: z[j] ~ normal(mu[y_argsort[j]], sigma[j]); but the author couldn't make the model identified like this 
                                        // mu[y_argsort[j]] sorts the mu's into the order that they were rated by rater J and, I think, because z is defined as ordered this provides the constraint that 
                                        // the values must be in the order that rater J put them.
  }
}

And here is the R code to prepare the simulated data, run the model and display the flipped ranking. The Stan code is in a Quarto document, so that chunk is read in and compiled using output.var = "model_1" so the R code that samples from the model using sampling rather than stan. I’ve added a version of the code that can be copied and pasted directly at the end of the post.


# Prepare data for Stan
# Select the ranking rows from candidate_details data frame, make into a matrix and transpose
panel_ranking_data <- candidate_details %>% 
  select(starts_with("rank")) %>% 
  as.matrix() %>% 
  t()
mu_sd_prior <- 0.5

data_list <- list(
  K = n,
  J = p,
  y = panel_ranking_data,
  mu_sd_prior = mu_sd_prior
)

fit1 <- sampling(
  model_1,                # Stan program object from previous chunk
  data = data_list,       # named list of data
  chains = 4,             # number of Markov chains
  warmup = 1000,          # number of warmup iterations per chain
  iter = 2000,            # total number of iterations per chain
  cores = 4,              # number of cores (could use one per chain)
  refresh = 100,             # progress shown
  verbose = FALSE          # TRUE gives more logging
)

# Get a list of mu estimates and their ranks so can put it into the candidates details frame for comparisons
est_ranks <- summary(fit1)$summary %>% # * TO DO * This feels very clunky surely there is an easier way to get a dataframe of parameters
  as.data.frame() %>% # If I change this to as_tibble I seem to loose the parameter names
  mutate(parameter = rownames(.)) %>% 
  select(parameter, everything()) %>% 
  as_data_frame() %>% 
  filter(str_detect(parameter, "mu\\[")) %>%
  mutate(name = letters[as.numeric(str_extract(parameter, "(?<=\\[).+?(?=\\])"))]) %>%  # Extract the number of the mu parameter ie candidate from the parameter name
  mutate(est_sc_rank = dense_rank(mean)) %>% 
  mutate(est_sc_order = order(mean)) %>% 
  select(name, mean, est_sc_rank, est_sc_order)

candidate_details <- candidate_details %>% 
  left_join(est_ranks, by = ("name" = "name"))

ggplot(candidate_details, aes(x = sc_order, y = est_sc_order)) +
  geom_point() + 
  xlim(0, 10) +
  ylim(0, 10)

Here is all the code combined into one R script:

pacman::p_load("tidyverse", "rstan", "reprex") # Package loading here to make it a reproducible example
set.seed(573) # Standard seed is 573
n <- 10    # Number of candidates
p <- 4    # Number of recruiting panel members 
candidate_details <- tibble(name = letters[1:n],     # Generate letter names for each candidate taken from start of alphabet
                            sc_ability = runif(n, min = 0, max = 10),    # Generate scientific ability for each candidate by random sampling
                            sc_rank = rank(sc_ability),    # Find the rank of each candidate, use negative as we want highest ability to be first
                            sc_order = order(sc_ability))   # Find the order in which the candidates have to be selected to put the highest ability first, ie in the first row is the index of the candidate with highest ability

panel_sc_sd <- 0.0 # Standard deviation of recruiters estimate of sc_ability

panel_names <- letters[(27-p):26] # Generate letter names for each member of the recruiting panel taken from end of alphabet
# Define function to produce panel estimates of scores
find_member_scores <- function(sc_ability){
  sc_ability + rnorm(length(sc_ability), mean = 0, sd = panel_sc_sd)
}
# Add panel member latent scores
for(panel_member in seq_along(panel_names)){ # * TO DO * It would be great to replace this with Purrr calls
  panelist <- paste0("p_sc_", panel_names[panel_member])
  candidate_details <- candidate_details %>% 
    mutate({{panelist}} := find_member_scores(sc_ability))
}
# Work out how each panel member ranks the candidates
candidate_details <- candidate_details %>% 
  mutate(across(starts_with("p_sc"), ~ dense_rank(.), .names = "rank_{.col}"))

model_1_text <- "
//
// This Stan program is the simpler working model from this thread of discussion: https://discourse.mc-stan.org/t/thurstonian-model/1735
// It does not allow sigma to vary between raters

data {
  int<lower=0> K; // number of items being ranked
  int<lower=0> J; // number of raters
  int<lower=1,upper=K> y[J,K]; // *1* y[i] - ie a row of the matrix is a vector of the ranks given to the candidates by rater J QU or is it an array?
  
  real<lower = 0> mu_sd_prior; // the standard deviation of the prior for the mu values ¿of raters?
  
}

transformed data{
  
  int y_argsort[J, K]; // a matrix to contain indices to sort the J rows into ascending order, so item 1 is '2' if the the 2nd item in y in row J is the 1st
  
  for (i in 1:J){ 
    y_argsort[i] = sort_indices_asc(y[i]); // apply the function 'sort_indices_asc' to sort ascending indices as detailed in line *1*
  }
}

parameters {
  vector[K-1] mu_hat; // these are the deduced values of mu for each item used in the construction of the mu vector below, the construction is necessary to identify the model by ensuring all the mu's sum to 0.
  //real<lower=0> sigma[J]; // This is the prior for allowing sigma to vary by rater.
  ordered[K] z[J]; // I think this is an ?array? of J ordered vectors of length K ie the set of rankings produced by the raters called z. This stores the deduced values that rater J sampled for applicant K
                   //                     ------K-----
                   //                    | z z z z z 
                   //                    J z z z z z
                   //                    | z z z z z
                   // using these values we then deduce the values of mu (the latent score of each applicant) that gave rise to the values in z
}

transformed parameters{ // this builds a vector of K values of mu with the lowest value fixed to 0 to identify the model (as the mu's can be translated along the number line without changing the model)
  vector[K] mu;
  mu = append_row(mu_hat, -sum(mu_hat)); // Initial thoughts on adding a centring constraint - although this doesn't constrain the scale, although neither does the previous approach
}

model {
  
  mu_hat ~ normal(0, mu_sd_prior); // vectorised assignment into mu-hat and therefore also mu? Adrian's model uses a Beta distribution, does this force the model to be identified as it limits mu's to 0-1
  //sigma ~ cauchy(0, 1);
  for (j in 1:J){                  // for each rater
    z[j] ~ normal(mu[y_argsort[j]], 1); // The intention was for this to be: z[j] ~ normal(mu[y_argsort[j]], sigma[j]); but the author couldn't make the model identified like this 
                                        // mu[y_argsort[j]] sorts the mu's into the order that they were rated by rater J and, I think, because z is defined as ordered this provides the constraint that 
                                        // the values must be in the order that rater J put them.
  }
}
"

# Prepare data for Stan
# Select the ranking rows from candidate_details data frame, make into a matrix and transpose
panel_ranking_data <- candidate_details %>% 
  select(starts_with("rank")) %>% 
  as.matrix() %>% 
  t()
mu_sd_prior <- 0.5

data_list <- list(
  K = n,
  J = p,
  y = panel_ranking_data,
  mu_sd_prior = mu_sd_prior
)

fit1 <- stan(
  model_code = model_1_text,                # Stan program as character vector
  data = data_list,       # named list of data
  chains = 4,             # number of Markov chains
  warmup = 1000,          # number of warmup iterations per chain
  iter = 2000,            # total number of iterations per chain
  cores = 4,              # number of cores (could use one per chain)
  refresh = 100,             # progress shown
  verbose = FALSE          # TRUE gives more logging
)

# Get a list of mu estimates and their ranks so can put it into the candidates details frame for comparisons
est_ranks <- summary(fit1)$summary %>% # * TO DO * This feels very clunky surely there is an easier way to get a dataframe of parameters
  as.data.frame() %>% # If I change this to as_tibble I seem to loose the parameter names
  mutate(parameter = rownames(.)) %>% 
  select(parameter, everything()) %>% 
  as_data_frame() %>% 
  filter(str_detect(parameter, "mu\\[")) %>%
  mutate(name = letters[as.numeric(str_extract(parameter, "(?<=\\[).+?(?=\\])"))]) %>%  # Extract the number of the mu parameter ie candidate from the parameter name
  mutate(est_sc_rank = dense_rank(mean)) %>% 
  mutate(est_sc_order = order(mean)) %>% 
  select(name, mean, est_sc_rank, est_sc_order)

candidate_details <- candidate_details %>% 
  left_join(est_ranks, by = ("name" = "name"))

ggplot(candidate_details, aes(x = sc_order, y = est_sc_order)) +
  geom_point() + 
  xlim(0, 10) +
  ylim(0, 10)

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.3 (2023-03-15)
#>  os       macOS Big Sur ... 10.16
#>  system   x86_64, darwin17.0
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/London
#>  date     2023-04-13
#>  pandoc   2.19.2 @ /Applications/RStudio - 2023.03.0-386.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.1   2023-03-23 [1] CRAN (R 4.2.0)
#>  digest        0.6.30  2022-10-18 [1] CRAN (R 4.2.0)
#>  evaluate      0.18    2022-11-07 [1] CRAN (R 4.2.0)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.2.0)
#>  fs            1.5.2   2021-12-08 [1] CRAN (R 4.2.0)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.0)
#>  highr         0.9     2021-04-16 [1] CRAN (R 4.2.0)
#>  htmltools     0.5.3   2022-07-18 [1] CRAN (R 4.2.0)
#>  knitr         1.41    2022-11-18 [1] CRAN (R 4.2.0)
#>  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.2.0)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.0)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 4.2.0)
#>  rlang         1.1.0   2023-03-14 [1] CRAN (R 4.2.0)
#>  rmarkdown     2.18    2022-11-09 [1] CRAN (R 4.2.0)
#>  rstudioapi    0.14    2022-08-22 [1] CRAN (R 4.2.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi       1.7.8   2022-07-11 [1] CRAN (R 4.2.0)
#>  stringr       1.5.0   2022-12-02 [1] CRAN (R 4.2.0)
#>  vctrs         0.6.1   2023-03-22 [1] CRAN (R 4.2.0)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.0)
#>  xfun          0.35    2022-11-16 [1] CRAN (R 4.2.0)
#>  yaml          2.3.6   2022-10-18 [1] CRAN (R 4.2.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

I think I have got the basic model working using the second suggestion in the Stan Manual (Stan User’s Guide). I am still testing to check it doesn’t have the same ‘flipping pairs’ problem. Further suggestions for improvements welcome. I don’t currently understand when the two further suggestions in the Stan Manual would be superior.

Here’s my updated Stan code:

  int<lower=0> K; // number of items being ranked
  int<lower=0> J; // number of raters
  int<lower=1,upper=K> y[J,K]; // *1* y[i] - ie a row of the matrix is a vector of the ranks given to the candidates by rater J
  
  real<lower = 0> mu_sd_prior; // the standard deviation of the prior for the mu values of raters
  
}

transformed data{
  
  int y_argsort[J, K]; // a matrix to contain indices to sort the J rows into ascending order, so item 1 is '2' if the the 2nd item in y in row J is the 1st
  
  for (i in 1:J){ 
    y_argsort[i] = sort_indices_asc(y[i]); // apply the function 'sort_indices_asc' to sort ascending indices as detailed in line *1*
  }

}

parameters {
  vector[K] mu; // these are the mu values for each candidate the calculation in transformed parameters ensures they sum to zero (https://mc-stan.org/docs/2_18/stan-users-guide/parameterizing-centered-vectors.html).
  //real<lower=0> sigma[J]; // This is the prior for allowing sigma to vary by rater.
  ordered[K] z[J]; // I think this is an ?array? of J ordered vectors of length K ie the set of rankings produced by the raters called z. This stores the deduced values that rater J sampled for applicant K
                   //                     ------K-----
                   //                    | z z z z z 
                   //                    J z z z z z
                   //                    | z z z z z
                   // using these values we then deduce the values of mu (the latent score of each applicant) that gave rise to the values in z
}

model {
  
  mu ~ normal(0, inv(sqrt(1 - inv(K)))); // This is the expression that ensures mu's sum to zero, so identifying the model. inv() is the elementwise inverse function which returns 1 / m[i, j] for each element (https://mc-stan.org/docs/functions-reference/linear-algebra-functions-and-solvers.html)
  //sigma ~ cauchy(0, 1);
  for (j in 1:J){                  // for each rater
    z[j] ~ normal(mu[y_argsort[j]], 1); // The intention was for this to be: z[j] ~ normal(mu[y_argsort[j]], sigma[j]); but the author couldn't make the model identified like this 
                                        // mu[y_argsort[j]] sorts the mu's into the order that they were rated by rater J and, I think, because z is defined as ordered this provides the constraint that 
                                        // the values must be in the order that rater J put them.
  }
}

Thanks for posting a followup, @stevewooding. We’ve had a hard time getting long-form questions answered on the forums.

For

ordered[K] z[J];

you get a size K array of ordered J-vectors to answer the question in your doc. You can put these in a single line:

vector[K] mu = append_row(mu_hat, -sum(mu_hat)); 

The general problem from the sum-to-zero constraint is that it doesn’t produce a generative model in the sense of being able to add another element to the vector. This is fine if we have things like exhaustive age categories, but not in other hierarchical settings.

The following does not ensure that the mu sum to zero.

mu ~ normal(0, inv(sqrt(1 - inv(K))));

It just individually pushes each of the mu toward zero. The way to softly center the sum is:

sum(mu) ~ normal(0, epsilon);

for some small epsilon. The only way to make sure they sum to zero exactly is to enforce it by construction as you did in the first example. The soft constraint gets harder as epsilon goes to zero.

I’m afraid I couldn’t understand what your model is doing. I’m afraid I don’t know what “Thurstonian” means (other than “in the manner of Thurston”). It sounds like the task is to take a bunch of ordinal ratings from individuals and uncover a latent “consensus” ordinal rating, but I don’t see how the model relates. If I’ve understood your problem correctly, there have been several papers on this that work from a standard noisy measurement model perspective (there’s a true rating and you get a noisy and biased answers from raters). For example, for the ordinal case, see:

This all tracks the epidemiology literature for tests (e.g., stage of cancer diagnosis based on radiology):

P.S. I would strongly suggest not running lines must over 80 characters so that they can be viewed more easily in these kinds of forums and on GitHub.

P.P.S. I would also suggest using a more flexible development environment than Quarto. Just the indirect error messages make it a non-starter for me. Once you have something working, writing up the presentation in Quarto is great. I just find the execution and error-reporting model suboptimal for development even compared to something like Jupyter.

I’ve been playing around with your data and model. It’s not clear to me why the flipping occurs. Seems like an interesting problem (though maybe the cause will be more obvious to others).

I haven’t observed this in my simulations. I observe a very predictable swap. When the rank of the last respondent (by letter, i.e. constrained to -sum(mu_k)) is first, last, or exactly in the middle (e.g. rank 6 of 11), then there is no swapping at all. When the rank is below the midpoint, the last respondent moves down a rank. And when it is above the midpoint, it moves up a rank. You can test this out with the code and stan model (model_sum0.stan) below, where n_compare sets the rank of the last respondent.

I also don’t observe a swap when estimating all of the means freely (as in the other model, model_normal.stan, and your updated model), at least for this very simple data generating process. Maybe you’ll be just fine moving forward with that simpler approach.

thurstone.r (4.8 KB)
model_normal.stan (3.0 KB)
model_sum0.stan (3.0 KB)