Dear Bob,
Thanks for your reply and sorry it took a while to respond.
I’ve added some more detail below. Any insights are greatly appreciated - but I appreciate that looking into others code can be very time consuming.
The observed data are rankings, such that the jth person ranks K items.
The elements of each row indicate the rank of the kth item (from item 1 to K).
For example, y[j,1] = 1, y[j,2] = 3, y[j,3] = 2, would indicate that the jth participant ranked the 1st item the best, the 3rd the second best and the second item the third best.
The thurstonian model links latent ratings on a continuous scale to observed rankings.
I’ve tried to implement a simpler model than you proposed above for a first pass (with a fixed standard deviation to identify the model)
The Stan code is given below:
The key line is in the model section:
z[i, y_argsort[i]] ~ multi_normal(mu, diag_matrix(rep_vector(1, K)));
where z is an ordered vector and y_argsort[i] is an array of indicies which would sort y[i] into ascending order.
So to my understanding that should mean that multi_normal(mu, diag_matrix(rep_vector(1, K))); can only return samples which when ordered by y_argsort[i] will be in ascending order.
I then fit the model to some simulated data (see R code below). It seems to look pretty reasonable when i set mu (in the simulation) as c(0,1,2). But when i set mu as c(0, 2, 1) the model doesn’t seem to recover the the parameters well.
I wonder if a) you have any insight into whats going on here - why is the later case doing something weird? and b) you have any tips for identifying the model if I allow the co variance matrix to vary by ranker?
Kind regards,
Oscar
stan code
> functions {
>
> int[] argsort(int[] y, int K){
> //Pass a 1D array of integers K items long and return the arguments that would sort the 1D array
>
> int y_argsort[K]; //Output argsort
>
>
> for (ind in 1:K){
>
> for (k in 1:K){
>
> if (y[k] == ind){
> y_argsort[ind] = k;
> }
> }
> }
> return y_argsort;
> }
> }
>
> data {
> int<lower=0> K; // number of items being ranked
> int<lower=0> J; // number of raters
> int<lower=1,upper=K> y[J,K]; // y[i] is a vector of
>
> real<lower = 0> mu_sd_prior;
> }
>
> transformed data{
>
> int y_argsort[J, K];
> for (i in 1:J){
> y_argsort[i] = argsort(y[i], K);
> }
> }
>
> parameters {
> vector[K-1] mu_hat;
> //real<lower=0> sigma[J];
> ordered[K] z[J];
> }
>
> transformed parameters{
>
> vector[K] mu;
>
> mu = append_row(0, mu_hat);
>
> }
>
>
>
> model {
>
> mu_hat ~ normal(0, mu_sd_prior);
>
> //sigma ~ cauchy(0, 1);
>
> for (i in 1:J){
>
> z[i, y_argsort[i]] ~ multi_normal(mu, diag_matrix(rep_vector(1, K))); //Sample from difference between means
> }
> }
Rcode
> library(tidyverse)
> library(rstan)
> library(bayesplot)
> library(MASS)
>
> rstan_options(auto_write = TRUE)
> options(mc.cores = parallel::detectCores())
>
> ##Generate some fake data
> J = 20 #Number of participants
>
> mu = c(0, 1, 2) #A vector of means for the true latent ratings
>
> mu_rank = rank(mu) #The rank order of mu
> K = length(mu)
>
> y = mvrnorm(n = J, mu, diag(1, K, K))
>
> y = data.frame(y1 = rnorm(J, mean = mu[1]), y2 = rnorm(J, mean = mu[2]), y3 = rnorm(J, mean = mu[3]))
> y_rank = t(apply(y, 1, rank))
>
> stan_data = list(y = y_rank, K = length(mu), J = J, mu_sd_prior = 1)
>
> #Fit the Thurstonian Model
> post1 <- stan(
> file = "ThurstonianModel.stan", # Stan program
> data= stan_data,
> chains = 4, # number of Markov chains
> cores = 4,
> warmup = 1000,
> iter = 2000,
> control = list(adapt_delta = 0.9, max_treedepth = 10)
> )
>
> #Extract and plot means
> s1 = extract(post1)
> s = as.data.frame(post1)
> mcmc_intervals(s, pars = c("mu[1]", "mu[2]", "mu[3]"))
>
> print(mu_rank)
[edit: escaped code, which already had > in it]