 # Raykar et al. Learning from Crowds model

I’m a huge fan of Raykar et al.'s paper on learning from crowds. Although it’s aimed at crowdsourcing data platforms for machine learning, like Amazon’s Mechanical Turk, the technique is generally useful in any kind of noisy rating (coding, annotation, diagnostic test) situation. These general families of models are well studied in the epidemiology literature where instead of raters rating data, it’s diagnostic tests being applied to subjects.

The citation is:

Raykar, V. C., Yu, S., Zhao, L. H., Valadez, G. H., Florin, C., Bogoni, L., & Moy, L. (2010). Learning from crowds. Journal of Machine Learning Research , 11 (Apr), 1297-1322.

Raykar et al. basically just replace the simple multinomial prevalence model of Dawid and Skene with a logistic regression (they only consider the binary case, but it’s easy to extend to K ratings or even ordinal ratings).

The paper spends a lot of time deriving an algorithm for estimation. It’s much less effort to just plug into Stan (though maybe doesn’t look so impressive, because there’s otherwise not much algebra required.

Here’s the model code

``````/**
* Implementation of joint estimation of logistic regression and
* annotator sensitivity and specifity.  This is just like the Dawid-Skene
* model with the basic prevalence replaced with a logistic regression
* based on predictors.
*
* Raykar, V. C., Yu, S., Zhao, L. H., Valadez, G. H., Florin, C.,
* Bogoni, L., & Moy, L.  (2010). Learning from crowds. Journal of
* Machine Learning Research, 11(Apr), 1297-1322.
*
* TODO:  hierarchical priors for alpha, beta
* TODO:  precompute counts for sum(y[ , r] == 1) and multiply
*        in the N/R loops and the eltwise multiply and sum
*/
data {
int<lower=0> N;                 // # items (reviews)
int<lower=0> R;                 // # raters (heuristics)
int<lower=0> D;                 // # of predictors for each item
matrix[N, D] x;                 // [n, d] predictors d for item n
int<lower=0, upper=1> y[N, R];  // outcome: 1 success, 0 failure
}
parameters {
vector[D] w;                         // logistic regression coeffs
real w0;                             // intercept
vector<lower=0, upper=1>[R] alpha;   // sensitivity
vector<lower=0, upper=1>[R] beta;    // specificity
}
model {
vector[N] logit_z_hat = w0 + x * w;
vector[N] log_z_hat = log_inv_logit(logit_z_hat);
vector[N] log1m_z_hat = log1m_inv_logit(logit_z_hat);

vector[R] log_alpha = log(alpha);
vector[R] log1m_alpha = log1m(alpha);
vector[R] log_beta = log(beta);
vector[R] log1m_beta = log1m(beta);

// prior
w ~ normal(0, 2);
w0 ~ normal(0, 5);
alpha ~ beta(10, 1);
beta ~ beta(10, 1);

// likelihood
for (n in 1:N) {
real pos_sum = log_z_hat[n];
real neg_sum = log1m_z_hat[n];
for (r in 1:R) {
if (y[n, r] == 1) {
pos_sum += log_alpha[r];
neg_sum += log1m_beta[r];
} else {
pos_sum += log1m_alpha[r];
neg_sum += log_beta[r];
}
}
target += log_sum_exp(pos_sum, neg_sum);
}
}
generated quantities {
vector[N] Pr_z_eq_1;
{
vector[N] logit_z_hat = w0 + x * w;
vector[N] log_z_hat = log_inv_logit(logit_z_hat);
vector[N] log1m_z_hat = log1m_inv_logit(logit_z_hat);

vector[R] log_alpha = log(alpha);
vector[R] log1m_alpha = log1m(alpha);
vector[R] log_beta = log(beta);
vector[R] log1m_beta = log1m(beta);

for (n in 1:N) {
real pos_sum = log_z_hat[n];
real neg_sum = log1m_z_hat[n];
for (r in 1:R) {
if (y[n, r] == 1) {
pos_sum += log_alpha[r];
neg_sum += log1m_beta[r];
} else {
pos_sum += log1m_alpha[r];
neg_sum += log_beta[r];
}
}
Pr_z_eq_1[n] = softmax([pos_sum, neg_sum]');
}
}
}
``````

I noted some performance improvements that could be made in the to-do section. It could also be easily extended to incomplete panel data (when not every rater rates every item). It could also be extended to hierarchical priors. @Marko’s example Dawid-Skene annotation model with partial pooling of annotators covers both hierarchical priors and incomplete data.

And here’s some R code to simulate data

``````inv_logit <- function(u) 1 / (1 + exp(-u))

N <- 500
R <- 10
D <- 20

x <- matrix(rnorm(N * D), N, D)
w <- rnorm(D)
w0 <- -2

alpha <- runif(R, 0.65, 0.95)
beta <- runif(R, 0.65, 0.95)

z <- rep(0, N);
for (n in 1:N)
z[n] <- rbinom(1, 1, inv_logit(w0 + (x[n, ] %*% w)))

y <- matrix(0, N, R);
for (n in 1:N)
for (r in 1:R)
y[n, r] <- rbinom(1, 1, ifelse(z[n], alpha[r], 1 - beta[r]))

init_fun <- function() {
return(list(alpha=rep(0.75,R), beta=rep(0.75,R), w=rep(0,D), w0=0));
}
``````

and some R code to fit:

``````library('rstan')
fit <- stan("raykar-marginal.stan",
data = c("N","R","D","x","y"),
init = function(n) list(alpha = rep(0.8, R), beta = rep(0.8, R), w = rep(0, D), w0 = 0))
``````

I checked that it recovered parameters accurately. Like the Dawid and Skene example, it starts with an initialization that assumes the annotators are not antagonistic. It also zeroes the slopes and intercept for the regression in the init. This should help if not totally prevent the chance of having a multimodal solution where one mode is that the annotators are accurate, but intentionally returning the wrong answers.

6 Likes

@Bob_Carpenter I tried this model with simulated data and varying performance measures (se and sp) and it works really well. Thank you.
At the moment the model assumes that each rater will score all the items being y[N,R].
Has anyone implemented this in a way that allows different number of raters per item?

1 Like

Sorry for missing this before. I’m kind of overwhelmed on Discourse.

All you need to do is to move it to long form like for other models. There’s a chapter in the user’s guide on how to implement ragged structures like this. But the idea is simple. Just create a long form like this:

``````rr, nn, y
...
``````

where `rr[i]` is the rater, `nn[i]` is the item being rated, and `y[i]` is the rating. The predictors can be organized as before as they only go into the prevalence model. Then rather than looping through items and annotators, you just loop through rows. Each annotation adds a term to the log density. The tricky part’s just getting the likelihood terms into the log-sum-exp. The easiest way to do that is just accumulate terms for each of them then do them all at the end. That is, something like:

``````
data {
// R and N and D as before but instead of the old y, this:
int<lower = 0> I;  // number of observations
int<lower = 1, upper = N> nn[I];  // item for observation i
int<lower = 1, upper = R> rr[i];  // rather for observation i
int<lower = 0, upper = 1> y[i];  // rating for observation i
...
model {
...
{
vector[N] pos_sum = log_z_hat;
vector[N] neg_sum = log1m_z_hat;
for (i in 1:I)
if (y[i] == 1) {
pos_sum[nn[i]] += log_alpha[rr[i]];
neg_sum[nn[i]] += log1m_beta[rr[i]];
} else {
pos_sum[nn[i]] += log1m_alpha[rr[i]];
neg_sum += log_beta[rr[i]];
}
}
for (n in 1:N)
target += log_sum_exp(pos_sum[n], neg_sum[n]);
}
``````

where `nn[i]` is the item for observation `i`, `rr[i]` is the rate for item `i`, and `y[i]` is the rating.

That corresponds to the bit of the model I had in the fully observed case:

``````  // likelihood
for (n in 1:N) {
real pos_sum = log_z_hat[n];
real neg_sum = log1m_z_hat[n];
for (r in 1:R) {
if (y[n, r] == 1) {
pos_sum += log_alpha[r];
neg_sum += log1m_beta[r];
} else {
pos_sum += log1m_alpha[r];
neg_sum += log_beta[r];
}
}
target += log_sum_exp(pos_sum, neg_sum);
}
``````

I haven’t tested it, but the basic idea of the raggedness encoding should be right.

I think @ducoveen may have also implemented this in ragged form.

2 Likes

Dear Bob,
Thank you so much for taking the time to reply and your detailed suggestions.

I have modified your model/codes. However, the model doesn’t seem correct and it does not produce good parameter estimates. Probably I’m missing something.

I’m not sure if the generated quantities block is OK.

Here the reproducible example:

``````
library('rstan')
library('reshape2')

inv_logit <- function(u) 1 / (1 + exp(-u))

N <- 50 # items
R <- 30 # subj
D <- 20 # predictors

x <- matrix(rnorm(N * D), N, D)
#x <- cbind(rep(1,nrow(x)), x)

w0 <- -2 # intercep
w <- rnorm(D) # intercep & reg coef

alpha <- runif(R, 0.65, 0.95) # se
beta <- runif(R, 0.65, 0.95)  # sp

z <- rep(0, N);

for (n in 1:N)
z[n] <- rbinom(1, 1, inv_logit(w0 + x[n, ] %*% w)) # true value

y <- matrix(0, N, R);
for (n in 1:N)
for (r in 1:R)
y[n, r] <- rbinom(1, 1, ifelse(z[n], alpha[r], 1 - beta[r])) # reported

df <- melt(y)
names(df) <- c("item", "rater", "y")

``````

Model

``````raykar_long <- "
data {
int<lower = 0> I;               // # of observations
int<lower=0> N;                 // # items
int<lower=0> R;                 // # raters
int<lower = 1, upper = N> nn[I];  // item for observation i
int<lower = 1, upper = R> rr[I];  // rater for observation i
int<lower=0, upper=1> y[I];      // outcome: 1 success, 0 failure
int<lower=0> D;                 // # of predictors for each item
matrix[N, D] x;                 // [n, d] predictors d for item n

}
parameters {
vector[D] w;                         // logistic regression coef
vector<lower=0, upper=1>[R] alpha;   // sensitivity
vector<lower=0, upper=1>[R] beta;    // specificity
}

model {
vector[N] logit_z_hat = x * w; //
vector[N] log_z_hat = log_inv_logit(logit_z_hat); // log p
vector[N] log1m_z_hat = log1m_inv_logit(logit_z_hat); // log 1 - p

vector[R] log_alpha = log(alpha);
vector[R] log1m_alpha = log1m(alpha);
vector[R] log_beta = log(beta);
vector[R] log1m_beta = log1m(beta);
vector[N] pos_sum = log_z_hat;
vector[N] neg_sum = log1m_z_hat;

for (i in 1:I){
if (y[i] == 1) {
pos_sum[nn[i]] += log_alpha[rr[i]];
neg_sum[nn[i]] += log1m_beta[rr[i]];
} else {
pos_sum[nn[i]] += log1m_alpha[rr[i]];
neg_sum += log_beta[rr[i]];
}
}
for (n in 1:N)
target += log_sum_exp(pos_sum[n], neg_sum[n]);

// priors
for (d in 1:D){
w[d] ~ normal(0, 2);
}
alpha ~ beta(10, 1);
beta ~ beta(10, 1);
}
generated quantities {
vector[N] Pr_z_eq_1;
vector[N] logit_z_hat = x * w;
vector[N] log_z_hat = log_inv_logit(logit_z_hat);
vector[N] log1m_z_hat = log1m_inv_logit(logit_z_hat);
vector[R] log_alpha = log(alpha);
vector[R] log1m_alpha = log1m(alpha);
vector[R] log_beta = log(beta);
vector[R] log1m_beta = log1m(beta);

vector[N] pos_sum = log_z_hat;
vector[N] neg_sum = log1m_z_hat;
for (i in 1:I){
if (y[i] == 1) {
pos_sum[nn[i]] += log_alpha[rr[i]];
neg_sum[nn[i]] += log1m_beta[rr[i]];
} else {
pos_sum[nn[i]] += log1m_alpha[rr[i]];
neg_sum += log_beta[rr[i]];
}
}
for (n in 1:N){
Pr_z_eq_1[n] = softmax([pos_sum[n], neg_sum[n]]');
}
}
"

data_list <- list(I = nrow(df),
N = length(unique(df\$item)),
R = length(unique(df\$rater)),
nn = df\$item,
rr = df\$rater,
D = D+1,
x = x, # design matrix : intercept + predictors
y = df\$y)

fit_long <- stan(model_code = raykar_long,raykar_long <- "
data {
int<lower = 0> I;               // # of observations
int<lower=0> N;                 // # items
int<lower=0> R;                 // # raters
int<lower = 1, upper = N> nn[I];  // item for observation i
int<lower = 1, upper = R> rr[I];  // rater for observation i
int<lower=0, upper=1> y[I];      // outcome: 1 success, 0 failure
int<lower=0> D;                 // # of predictors for each item
matrix[N, D] x;                 // [n, d] predictors d for item n

}
parameters {
vector[D] w;                         // logistic regression coef
real w0;                             // intercept
vector<lower=0, upper=1>[R] alpha;   // sensitivity
vector<lower=0, upper=1>[R] beta;    // specificity
}

model {
vector[N] logit_z_hat = w0 + x * w; //
vector[N] log_z_hat = log_inv_logit(logit_z_hat); // log p
vector[N] log1m_z_hat = log1m_inv_logit(logit_z_hat); // log 1 - p

vector[R] log_alpha = log(alpha);
vector[R] log1m_alpha = log1m(alpha);
vector[R] log_beta = log(beta);
vector[R] log1m_beta = log1m(beta);
vector[N] pos_sum = log_z_hat;
vector[N] neg_sum = log1m_z_hat;

for (i in 1:I){
if (y[i] == 1) {
pos_sum[nn[i]] += log_alpha[rr[i]];
neg_sum[nn[i]] += log1m_beta[rr[i]];
} else {
pos_sum[nn[i]] += log1m_alpha[rr[i]];
neg_sum += log_beta[rr[i]];
}
}
for (n in 1:N)
target += log_sum_exp(pos_sum[n], neg_sum[n]);

// priors
w0 ~ normal(0, 5);
for (d in 1:D){
w[d] ~ normal(0, 2);
}
alpha ~ beta(10, 1);
beta ~ beta(10, 1);
}
generated quantities {
vector[N] Pr_z_eq_1;
vector[N] logit_z_hat = w0 + x * w;
vector[N] log_z_hat = log_inv_logit(logit_z_hat);
vector[N] log1m_z_hat = log1m_inv_logit(logit_z_hat);
vector[R] log_alpha = log(alpha);
vector[R] log1m_alpha = log1m(alpha);
vector[R] log_beta = log(beta);
vector[R] log1m_beta = log1m(beta);

vector[N] pos_sum = log_z_hat;
vector[N] neg_sum = log1m_z_hat;
for (i in 1:I){
if (y[i] == 1) {
pos_sum[nn[i]] += log_alpha[rr[i]];
neg_sum[nn[i]] += log1m_beta[rr[i]];
} else {
pos_sum[nn[i]] += log1m_alpha[rr[i]];
neg_sum += log_beta[rr[i]];
}
}
for (n in 1:N){
Pr_z_eq_1[n] = softmax([pos_sum[n], neg_sum[n]]');
}
}
"

data_list <- list(I = nrow(df),
N = length(unique(df\$item)),
R = length(unique(df\$rater)),
nn = df\$item,
rr = df\$rater,
D = D,
x = x, # design matrix : intercept + predictors
y = df\$y)

fit_long <- stan(model_code = raykar_long,
model_name = "raykar_long",
data = data_list,
init = function(n) list(alpha = rep(0.8, length(unique(df\$rater))),
beta = rep(0.8, length(unique(df\$rater)))),
chains = 3,
verbose = F,
iter = 8000,
warmup = 4000,
seed = 3,
refresh = max(8000/100, 1))
``````

Hello,

The ragged form I have lying around is the following:

``````data {
int<lower=0> I;
int<lower=0> J;
int<lower=0> N;
int<lower=0> D;  // differs from annotation only
matrix[I,D] x;   // differs from annotation only
int<lower=0,upper=1> y[N];
int<lower=1> ii[N];
int<lower=1> jj[N];
}
parameters {
vector[D] w;   // differs from annotation only
real w0;

vector<lower=0.1,upper=1>[J] alpha;
vector<lower=0.1,upper=1>[J] beta;
}

model {
vector[I] logit_z_hat;
vector[I] log_z_hat;
vector[I] log1m_z_hat;

vector[J] log_alpha;
vector[J] log1m_alpha;
vector[J] log_beta;
vector[J] log1m_beta;

logit_z_hat = w0 + x * w; // differs from annotation only

for (i in 1:I) {
log_z_hat[i] = log_inv_logit(logit_z_hat[i]);
log1m_z_hat[i] = log1m_inv_logit(logit_z_hat[i]);
}

for (j in 1:J) {
log_alpha[j] = log(alpha[j]);
log1m_alpha[j] = log1m(alpha[j]);
log_beta[j] = log(beta[j]);
log1m_beta[j] = log1m(beta[j]);
}

w ~ normal(0,2);  // differs from annotation only
w0 ~ normal(0,5);
alpha ~ beta(10,1);
beta ~ beta(10,1);

for (n in 1:N){
real pos_sum;
real neg_sum;
pos_sum = log_z_hat[ii[n]];
neg_sum = log1m_z_hat[ii[n]];
if (y[n] == 1) {
pos_sum = pos_sum + log_alpha[jj[n]];
neg_sum = neg_sum + log1m_beta[jj[n]];
} else {
pos_sum = pos_sum + log1m_alpha[jj[n]];
neg_sum = neg_sum + log_beta[jj[n]];
}
target += log_sum_exp(pos_sum, neg_sum);
}

}
``````

Which has been tested with Simulation Based Calibration and should work. @Hemingway, I’ll take a closer look at you model later.

Best, Duco

1 Like

Hi @ducoveen ,
Thanks for the codes. I ran some sims and it works quite well.
These are the codes based on @Bob_Carpenter example:

``````inv_logit <- function(u) 1 / (1 + exp(-u))

N <- 500 # items
R <- 30 # subj
D <- 20 # predictors

x <- matrix(rnorm(N * D), N, D)
w <- rnorm(D) # reg coef
w0 <- -2 # intercep

alpha <- runif(R, 0.65, 0.95) #se
beta <- runif(R, 0.65, 0.95) # sp

z <- rep(0, N);

for (n in 1:N)
z[n] <- rbinom(1, 1, inv_logit(w0 + (x[n, ] %*% w))) # true value

y <- matrix(0, N, R);
for (n in 1:N)
for (r in 1:R)
y[n, r] <- rbinom(1, 1, ifelse(z[n], alpha[r], 1 - beta[r])) # reported

library('reshape2')
df <- melt(y)
names(df) <- c("item", "rater", "y")

``````
``````
raykar_long <- "
data {
int<lower = 0> I;               // # of observations
int<lower=0> N;                 // # items
int<lower=0> R;                 // # raters
int<lower = 1, upper = N> nn[I];  // item for observation i
int<lower = 1, upper = R> rr[I];  // rater for observation i
int<lower=0, upper=1> y[I];      // outcome: 1 success, 0 failure
int<lower=0> D;                 // # of predictors for each item
matrix[N, D] x;                 // [n, d] predictors d for item n

}
parameters {
vector[D] w;                         // logistic regression coef
real w0;                             // intercept
vector<lower=0, upper=1>[R] alpha;   // sensitivity
vector<lower=0, upper=1>[R] beta;    // specificity
}

model {
vector[N] logit_z_hat = w0 + x * w; //
vector[N] log_z_hat = log_inv_logit(logit_z_hat); // log p
vector[N] log1m_z_hat = log1m_inv_logit(logit_z_hat); // log 1 - p

vector[R] log_alpha = log(alpha);
vector[R] log1m_alpha = log1m(alpha);
vector[R] log_beta = log(beta);
vector[R] log1m_beta = log1m(beta);

for (i in 1:I){
real pos_sum;
real neg_sum;
pos_sum = log_z_hat[nn[i]];
neg_sum = log1m_z_hat[nn[i]];
if (y[i] == 1) {
pos_sum = pos_sum + log_alpha[rr[i]];
neg_sum = neg_sum + log1m_beta[rr[i]];
} else {
pos_sum = pos_sum + log1m_alpha[rr[i]];
neg_sum = neg_sum + log_beta[rr[i]];
}
target += log_sum_exp(pos_sum, neg_sum);
}

// priors
w0 ~ normal(0, 5);
for (d in 1:D){
w[d] ~ normal(0, 2);
}
alpha ~ beta(10, 1);
beta ~ beta(10, 1);
}

generated quantities {
vector[N] Pr_z_eq_1;
vector[N] logit_z_hat = w0 + x * w;
vector[N] log_z_hat = log_inv_logit(logit_z_hat);
vector[N] log1m_z_hat = log1m_inv_logit(logit_z_hat);
vector[R] log_alpha = log(alpha);
vector[R] log1m_alpha = log1m(alpha);
vector[R] log_beta = log(beta);
vector[R] log1m_beta = log1m(beta);

for (i in 1:I){
real pos_sum;
real neg_sum;
pos_sum = log_z_hat[nn[i]];
neg_sum = log1m_z_hat[nn[i]];
if (y[i] == 1) {
pos_sum = pos_sum + log_alpha[rr[i]];
neg_sum = neg_sum + log1m_beta[rr[i]];
} else {
pos_sum = pos_sum + log1m_alpha[rr[i]];
neg_sum = neg_sum + log_beta[rr[i]];
}
Pr_z_eq_1[nn[i]] = softmax([pos_sum, neg_sum]');
}
}
"
``````

Great to hear that it seems to work well!

1 Like