Hey Stan Users:
I got in over my head with a mixed model that clusters respondents based on their rankings. I can’t get my alpha intercept to mix right and can’t get the chains converging, with bi-modal outcomes occurring. I bet I’m doing something stupid, and I’m hoping you could kindly point it out.
A couple quick facts:
- I’m standardizing my covariates before running rStan.
- I’m using some simple synthetic data, which I’ll include. (This data might be the cause of my problems?)
- This work is based on a paper I’ll link below. Basically it describes a model for clustering rankers based on the covariates of the entities being ranked. I naively believe I can use the same model but adjust it to use covariates of rankers.
- I am super new at this (rStan & Bayes Modeling)
Here’s the link to the paper: Bayesian Aggregation of Rank Data with Covariates & Heterogenous Rankers
My Data:
- ranker_covariates.csv (140 Bytes)
- ranks.csv (472 Bytes)
- Note: Basically it’s 20 rankers, with rankers 1-10 ranking things in exactly opposite ways from 11-20 to test clustering. I’m using a single covariate, with the value 1 assigned to rankers 1-10 and value 2 to rankers 11-20.
Here are the priors, and I’ve only been experimenting with alpha and beta:
- scale_sd_prior = 0.1, # Basically provides a mean of 1 on lognormal
- gamma = 1, # Seemed appropriate given the below paper
- sigma_alpha = 1, # Below paper suggests making alpha small to let beta do more work
- sigma_beta = 100 # Below paper suggested diffuse prior
- M = 5 # My truncated dirichlet process uses only a few clusters for testing right now
Thank you for your help!
-Mike
data {
int<lower = 1> N; //The number of data points, the number of ranking lists
int<lower=1> K; // number of items being ranked
int<lower = 1> C; // The number of covariates
int<lower = 1> J; //Number of raters
int<lower = 1> M; //Max. number of clusters for truncated dirichlet process
int<lower=1,upper=K> y[N,K]; // y[i] is a row vector of rankings for the ith data point
matrix[N, C] X; // these are covariates on a per ranking basis
int<lower = 1> rater[N]; //Index the rater who produced rank i
real<lower = 0> scale_sd_prior; //Ranker specific error
real<lower=0> gamma; // DP concentration parameter
real<lower = 0> sigma_alpha; //hyperparameter intercept for baseline Gaussian for DP
real<lower = 0> sigma_beta; //hyperparameter beta for baseline Gaussian for DP
}
transformed data {
int y_argsort[N, K];
for (i in 1:N){
y_argsort[i] = sort_indices_asc(y[i]);//Indices of rankings
}
}
parameters {
ordered[K] z_hat[N]; //the estimate of the true underlying score of each ranker
matrix[C, K-1] beta_cluster[M]; //matrix of differences from the first ranking (fixed at zero) for cluster M
matrix[K, M] alpha_cluster; //intercepts on each cluster M
vector<lower = 0>[J] scale; //individual error captured per person
vector<lower=0,upper=1>[M-1] v; // DP stick-breaking weights
}
transformed parameters{
matrix[N, K] mu_part_cluster[M];
vector[K] mu_part_vector[M, N];
//Builds a matrix of betas that are prepended with 0's (for identifiability)
for (m in 1:M) {
mu_part_cluster[m] = append_col(rep_vector(0, N), X * beta_cluster[m]);
for (i in 1:N) {
//Sorting ranks by y_argsort
mu_part_vector[m, i] = to_vector(mu_part_cluster[m][i, y_argsort[i]]);
}
}
//This breaks the stick for dirichlet process
vector[M] log_pi_raw;
log_pi_raw[1] = log(v[1]);
for (m in 2:(M-1)){
log_pi_raw[m] = log(v[m]) + sum(log(1 - v[:m-1]));
}
log_pi_raw[M] = sum(log(1 - v));
simplex[M] pi = softmax(log_pi_raw);
}
model {
//prior for ranker specific error
scale ~ lognormal(0, scale_sd_prior);
// Prior for stick-breaking weights v
for (m in 1:(M-1)) {
v[m] ~ beta(1, gamma);
}
//prior for cluster betas
for (m in 1:M){
to_vector(beta_cluster[m]) ~ normal(0, sigma_beta);
}
//prior for cluster alphas
to_vector(alpha_cluster) ~ normal(0, sigma_alpha);
//likelihood on per cluster basis
for (i in 1:N){
vector[M] log_lik_cluster;
// Likelihood for cluster rankings
for (m in 1:M) {
log_lik_cluster[m] = normal_lpdf(z_hat[i] | alpha_cluster[, m] + mu_part_vector[m][i] * 1/scale[rater[i]], 1);
}
//marginalize out cluster weights
target += log_sum_exp(log_lik_cluster + log(pi)); // marginalize out cluster weights
}
}