I want to implement a version of the Dawid-Skene model which is described in the manual (p. 217). Bob has pointed me to the model in an earlier thread (https://groups.google.com/d/topic/stan-users/G1nPD2TbZ08/discussion).
The manual states “because there are J coders, it would be natural to extend the model to include a hierarchical prior for and to partially pool the estimates of coder accuracy and bias.” My goal is to implement such a partial pooling model. I started with a modification of the no pooling model in the manual (Full model codes are pasted below, new users are unfortunately not allowed to upload files). My implementation differs somewhat from the manual because I wanted to allow for missing data (not every unit is seen by each annotator) and include data from independent studies with different (but unknown) prevalences of the true categories.
I then followed the partial pooling vignette (http://mc-stan.org/documentation/case-studies/pool-binary-trials.html):
The common misclassification probabilities eta
are sampled from a Dirichlet prior:
for (k in 1:K)
eta[k] ~ dirichlet(beta[k]);
The weight of the common misclassification matrix kappa
relative to the data is sampled from a Pareto prior:
kappa ~ pareto(1, 1.5);
The misclassification matrices for all coders theta[j]
are obtained by combining both:
for (j in 1:J)
for (k in 1:K)
theta[j, k] ~ dirichlet(eta[k] * kappa);
The parameters are recovered well from simulated data. However, in smallish (but typical for my field) data sets (for example, with I = 50
test units which are all coded by the same J = 7
coders), there are almost always a small number of divergent transitions. If I understood the warning guide correctly, even a small number of divergent transitions invalidates the result - even if the true parameters are generally recovered quite well in different simulations. The problem of small data make sense to me - there is just not much information from which to approximate a common misclassification matrix.
So my questions are
- Does this approach to partial pooling of coder abilities make sense in general or are the divergent transitions an indicator for a fundamental problem?
- Is there an (easy) solution to the divergent transitions problem or is it a sign that there is just not enough data to estimate the partial pooling model? (in the second case, I would consequently estimate the no pooling model (with larger uncertainty in many setups) or apply the interchangeable coder assumption of a complete pooling model (with less uncertainty, but probably larger bias)).
Thank you in advance for any advice!
P.S.: Bob already suggested that the code would be more efficient with vectorization. This is next on my list after I have a grasp on the partial pooling problem.
Model code here (I think it would be helpful to allow new users to upload files, this would seem to be an easier solution; I would then also provide example data…):
Partial pooling:
data {
int<lower=2> K; // number of categories
vector<lower=0>[K] beta[K]; // dirichlet prior on probability that coder j assigns category k to coding unit i
int<lower=1> J; // number of coders
int<lower=1> I; // number of coding units
int<lower=1> N; // number of codings
int<lower=1> S; // number of independent samples
int<lower=1,upper=K> y[N]; // codings
int<lower=1,upper=I> ii[N]; // index coding units in data
int<lower=1,upper=J> jj[N]; // index coder in data
int<lower=1,upper=S> ss[N]; // index sample in data
int<lower=1,upper=I> iii[I]; // index coding units to sample (not necessary, for completeness)
int<lower=1,upper=S> sss[I]; // index sample to coding units
vector<lower=0>[K] alpha[S]; // dirichlet prior on category prevalence in studies
}
parameters {
simplex[K] pi[S]; // category prevalence in independent samples (with non-differential error assumption)
simplex[K] theta[J, K]; // misclassification probablilties for each coder (partial pooling)
simplex[K] eta[K]; // overall misclassification probablilties (partial pooling)
real<lower=1> kappa; // amount of pooling (see http://mc-stan.org/documentation/case-studies/pool-binary-trials.html)
}
transformed parameters {
vector[K] log_q_z[I];
for (i in 1:I)
log_q_z[i] = log(pi[sss[i]]);
for (n in 1:N)
for (k in 1:K)
log_q_z[ii[n], k] = log_q_z[ii[n], k] + log(theta[jj[n], k, y[n]]);
}
model {
for (s in 1:S)
pi[s] ~ dirichlet(alpha[s]);
kappa ~ pareto(1, 1.5);
for (k in 1:K)
eta[k] ~ dirichlet(beta[k]);
for (j in 1:J)
for (k in 1:K)
theta[j, k] ~ dirichlet(eta[k] * kappa);
for (i in 1:I)
target += log_sum_exp(log_q_z[i]);
}
generated quantities {
vector[K] softmax_lqz[I]; // probabilities of each k for each coding unit
for (i in 1:I)
softmax_lqz[i] = softmax(log_q_z[i, ]);
}
No pooling (adapted from the manual):
data {
int<lower=2> K; // number of categories
vector<lower=0>[K] beta[K]; // dirichlet prior on probability that coder j assigns category k to coding unit i
int<lower=1> J; // number of coders
int<lower=1> I; // number of coding units
int<lower=1> N; // number of codings
int<lower=1> S; // number of independent samples
int<lower=1,upper=K> y[N]; // codings
int<lower=1,upper=I> ii[N]; // index coding units in data
int<lower=1,upper=J> jj[N]; // index coder in data
int<lower=1,upper=S> ss[N]; // index sample in data
int<lower=1,upper=I> iii[I]; // index coding units to sample (not necessary, for completeness)
int<lower=1,upper=S> sss[I]; // index sample to coding units
vector<lower=0>[K] alpha[S]; // dirichlet prior on category prevalence in studies
}
parameters {
simplex[K] pi[S]; // category prevalence in independent samples (with non-differential error assumption)
simplex[K] theta[J, K]; // misclassification probablilties for each coder (no pooling)
}
transformed parameters {
vector[K] log_q_z[I];
for (i in 1:I)
log_q_z[i] = log(pi[sss[i]]);
for (n in 1:N)
for (k in 1:K)
log_q_z[ii[n], k] = log_q_z[ii[n], k] + log(theta[jj[n], k, y[n]]);
}
model {
for (s in 1:S)
pi[s] ~ dirichlet(alpha[s]);
for (j in 1:J)
for (k in 1:K)
theta[j, k] ~ dirichlet(beta[k]);
for (i in 1:I)
target += log_sum_exp(log_q_z[i]);
}
generated quantities {
vector[K] softmax_lqz[I]; // probabilities of each k for each coding unit
for (i in 1:I)
softmax_lqz[i] = softmax(log_q_z[i, ]);
}