Thurstonian model


#1

Back on the old user group Julian King raised a question about implementing a Thurstonian model in Stan:

https://groups.google.com/forum/#!msg/stan-users/pphrl_IxTF8/SaPU24YmjIAJ

An implementation for JAGS can be found here:

http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0096431.s001&type=supplementary

At the time Bob Carpenter provided Stan code for the model (see below). My question is how can one map from the latent rating (x), to the observed rank data (y)? I’m not sure how easy this is to do in Stan.

I’m also not clear on why x is an ordered vector in this implementation?

Any advice is greatly appreciated.

data {
int<lower=0> I; // number of items being ranked
int<lower=0> J; // number of raters
int<lower=1,upper=I> y[J,I]; // ranks for rater j
}
parameters {
real mu[I]; // true latent positions
real<lower=0,upper=20> sigma[J]; // accuracy of rater j
ordered[I] x[J]; // positions assigned by rater j
}
model {
mu ~ normal(0,1000);
for (j in 1:J)
for (i in 1:I)
x[j,i] ~ normal(mu[i],sigma[j]);
}


#2

Can you define how the observed and latent rankings compare? I’m afraid I don’t have time to dive off into a paper.


#3

Dear Bob,

Thanks for your reply and sorry it took a while to respond.

I’ve added some more detail below. Any insights are greatly appreciated - but I appreciate that looking into others code can be very time consuming.

The observed data are rankings, such that the jth person ranks K items.

The elements of each row indicate the rank of the kth item (from item 1 to K).

For example, y[j,1] = 1, y[j,2] = 3, y[j,3] = 2, would indicate that the jth participant ranked the 1st item the best, the 3rd the second best and the second item the third best.

The thurstonian model links latent ratings on a continuous scale to observed rankings.

I’ve tried to implement a simpler model than you proposed above for a first pass (with a fixed standard deviation to identify the model)

The Stan code is given below:

The key line is in the model section:

z[i, y_argsort[i]] ~ multi_normal(mu, diag_matrix(rep_vector(1, K))); 	

where z is an ordered vector and y_argsort[i] is an array of indicies which would sort y[i] into ascending order.

So to my understanding that should mean that multi_normal(mu, diag_matrix(rep_vector(1, K))); can only return samples which when ordered by y_argsort[i] will be in ascending order.

I then fit the model to some simulated data (see R code below). It seems to look pretty reasonable when i set mu (in the simulation) as c(0,1,2). But when i set mu as c(0, 2, 1) the model doesn’t seem to recover the the parameters well.

I wonder if a) you have any insight into whats going on here - why is the later case doing something weird? and b) you have any tips for identifying the model if I allow the co variance matrix to vary by ranker?

Kind regards,

Oscar

stan code

> functions {
> 
> 	int[] argsort(int[] y, int K){
> 		//Pass a 1D array of integers K items long and return the arguments that would sort the 1D array
> 		
> 		int y_argsort[K]; //Output argsort
> 
> 
> 		for (ind in 1:K){
> 
> 			for (k in 1:K){
> 
> 				if (y[k] == ind){
> 					y_argsort[ind] = k; 
> 				}
> 			}
> 		}
> 		return y_argsort;
> 	}
> }
> 
> data { 
> 	int<lower=0> K;  // number of items being ranked 
> 	int<lower=0> J;  // number of raters 
> 	int<lower=1,upper=K> y[J,K];   // y[i] is a vector of
> 
> 	real<lower = 0> mu_sd_prior;
> } 
> 
> transformed data{
> 
> 	int y_argsort[J, K];
> 	for (i in 1:J){	
> 		y_argsort[i] = argsort(y[i], K);		
> 	}
> }
> 
> parameters { 
> 	vector[K-1] mu_hat;                        
> 	//real<lower=0> sigma[J];   
> 	ordered[K] z[J];                   
> } 
> 
> transformed parameters{
> 	
> 	vector[K] mu;
> 
> 	mu = append_row(0, mu_hat);
> 
> }
> 
> 
> 
> model { 
> 	
> 	mu_hat ~ normal(0, mu_sd_prior);   
> 
> 	//sigma ~ cauchy(0, 1);
>   
> 	for (i in 1:J){		
> 
> 		z[i, y_argsort[i]] ~ multi_normal(mu, diag_matrix(rep_vector(1, K))); 	//Sample from difference between means				
> 	}		
> } 

Rcode

> library(tidyverse)
> library(rstan)
> library(bayesplot)
> library(MASS)
> 
> rstan_options(auto_write = TRUE)
> options(mc.cores = parallel::detectCores())
> 
> ##Generate some fake data
> J = 20 #Number of participants
> 
> mu = c(0, 1, 2) #A vector of means for the true latent ratings
> 
> mu_rank = rank(mu) #The rank order of mu
> K = length(mu)
> 
> y = mvrnorm(n = J, mu, diag(1, K, K))
> 
> y = data.frame(y1 = rnorm(J, mean = mu[1]), y2 = rnorm(J, mean = mu[2]), y3 = rnorm(J, mean = mu[3]))
> y_rank = t(apply(y, 1, rank))
> 
> stan_data = list(y = y_rank, K = length(mu), J = J, mu_sd_prior = 1)
> 
> #Fit the Thurstonian Model 
> post1 <- stan(
>   file = "ThurstonianModel.stan",  # Stan program
>   data= stan_data,
>   chains = 4,             # number of Markov chains
>   cores = 4,            
>   warmup = 1000,
>   iter = 2000,
>   control = list(adapt_delta = 0.9, max_treedepth = 10)
> )
> 
> #Extract and plot means
> s1 = extract(post1)
> s = as.data.frame(post1)
> mcmc_intervals(s, pars = c("mu[1]", "mu[2]", "mu[3]"))
> 
> print(mu_rank)

[edit: escaped code, which already had > in it]


#4

You don’t need to pass the size and the array, size(x) gives you the size of array x and makes sure you never have a mismatch. Also, that loop is really inefficient (not that it’ll reallymatter in the transformed data block) and I couldn’t quite tell what it’s doing—we have the usual sorting and index sorting algorithms for arrays built in.

So it’s really hard to tell what’s going on in the rest of the model. My guess would be an error in the sorting routine, especially if there are 0 elements as Stan indexes from 1.

You don’t ever want to do this:

z[i, y_argsort[i]] ~ multi_normal(mu, diag_matrix(rep_vector(1, K)));

as it’s equivalent with vectorization to the much much more efficient

z[i, y_argsort[i]] ~ normal(mu, 1);

The question to ask is then whether everything’s getting just one distribution here; that is, the same z[i, j] never shows up twice on the left-hand side.


#5

Great, many thanks Bob.

My function was doing the same as the Stan function sort_indices_asc(). I was trying to implement something similar to this (from http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0096431#s6):

image

Just to be confusing I’ve called the variable x, z in my Stan code. I’ve managed to get it working now when sigma is fixed to 1. It seemed that changing the line

z[i, y_argsort[i]] ~ normal(mu, 1));

to this:

z[i] ~ normal(mu[y_argsort[i]], 1`);

did the trick. My understanding of the model is that each sample of latent ratings for person j (x_j in the model above) should have the same rank as the observed data for person y (y_j). Or equivalently, if x_j is sorted according to sort_indices_asc(y_j), the ratings should then be in ascending order (so the ordered vector could be used for the model constraint).

It actually seems to work quite well now when I try it with the data from the paper above.

My next problem is identifying the model. In the paper they allow sigma to vary by rater, but they did not identify the model (they just post processed the samples). This does not work at all well in Stan

I wonder if anyone knows how I might identify this model sensibly?

All the best,

Oscar

For anyone is interested, the full Stan code is now:

data {
int<lower=0> K; // number of items being ranked
int<lower=0> J; // number of raters
int<lower=1,upper=K> y[J,K]; // y[i] is a vector of

real<lower = 0> mu_sd_prior;
}

transformed data{

int y_argsort[J, K];

for (i in 1:J){
y_argsort[i] = sort_indices_asc(y[i]);
}
}

parameters {
vector[K-1] mu_hat;
//real<lower=0> sigma[J];
ordered[K] z[J];
}

transformed parameters{
vector[K] mu;
mu = append_row(0, mu_hat);
}

model {

mu_hat ~ normal(0, mu_sd_prior);
//sigma ~ cauchy(0, 1);
for (j in 1:J){
z[j] ~ normal(mu[y_argsort[j]], 1); //I would like this to be: z[j] ~ normal(mu[y_argsort[j]], sigma[j]); But I’m not sure how to identify the model
}

}


#6

The Thurstonian models are unidentified on the latent parameter scale because adding a constant to all latent values leaves the ordering unchanged.

There’s two options here:

  • The easiest option is to define the first latent value to be zero. All all other latent values are then effectively defined with reference to this. This gives you N-1 latent values to estimate rather than N
  • Put a broad prior on all latent values with mean zero. This stops the latent values being able to wander off to arbitrarily large values.

I have tried both of these and either works fine. The sampling of the first one is a bit better because it explicitly removes the degeneracy.


#7

Hi Julian,

Thanks for getting in touch.

I’m actually already doing this in my model (I fix the first value of mu to 0), which seems to work fine for me when sigma is fixed. When I try to estimate a value for sigma (either a single sigma for all participants, or one for each participant) the model doesn’t fit well (I get a lot of divergent transitions).

I’m assuming the problem is that there is also multiplicative non-identifiability when sigma is not fixed to a specific value. Did you manage to fit your model without explicitly dealing with this?

All the best,

Oscar


#8

From memory, For one rater, the model is also unidentified multiplicatively because changing sigma doesn’t change the score ordering.

I think you need to define sigma for the first rater as 1. That should solve the issue.


#9

Read this paper: it’s by far the clearest explanation o have found about the Thurstone models and makes r very clear what constraints are required for identifiability.

http://mayagupta.org/publications/PairedComparisonTutorialTsukidaGupta.pdf


#10

Thanks Julian. That paper is great, but I don’t think it covers how to constrain when allowing sigma to vary over raters?

Fixing the first sigma to 1 didn’t seem to work, but doing that and having a pretty tight prior on sigma (centered around 1) seems to fit ok and give results similar to the plos one paper. Code below:

transformed parameters{

vector[K] mu;
vector<lower = 0>[J] sigma;

mu = append_row(0, mu_hat);
sigma = append_row(1, sigma_hat);
}

model {

mu_hat ~ normal(0, mu_sd_prior);
sigma_hat ~ lognormal(0, 0.5);

for (j in 1:J){
z[j] ~ normal(mu[y_argsort[j]], sigma[j]);
}
}

Thanks again,

Oscar


#11

With these identification techniques, you have to be careful with what it does to the interpretation of priors and openness.

With a pinned value, your other values are always interpreted relative to the pinned value.

A popular alternative is to use a sum-to-zero constraint, which is described in the manual. That’s the hard form of adding a prior. We’ve seen this go both ways in terms of efficiency compared to a prior (same with pinning values).

The problem with sum-to-zero is that it no longer makes sense to add a new group—no way to do that and have everything sum to zero.


#12

Great thanks both for your insights.

I had a quick experiment and the pinned value identification seems to be faster in this case. With this model I think it makes a lot of sense to frame it in terms of differences from the pinned value, as we don’t really care about their absolute value.

The biggest problem I’m still having is the multiplicative non-identifiability. Having a tight prior on sigma seems to allow the model to fit without divergent transitions, but it’s pretty slow.


#13

If there’s truly a multiplicative identifiability, just set sigma = 1. That’s what we do for the IRT models anyway. One of the parameters gets a normal(0, 1) prior, which sets the location and the scale. Then the multiplier and offset are scaled to that.


#14

That seems to work nicely.

Thank you both for your help! Really appreciate it.