Modeling real valued annotation process


I am new to Bayesian modeling and want to model annotator behavior in a real-valued annotation process:

  • Annotators assign a real value in the closed interval [0, 5] to items, with a step size of 0.2
  • Each item is annotated by two different annotators
  • There are a total of ~20 annotators and several thousand items.

The distribution of annotated values is skewed towards the top end of the range, with 20% of annotations equal to 5, around 50% of annotations greater than 4, and almost no values at the lower end.

I started off with the simplest model I could think off, modeling each annotation as normally distributed around a latent ground truth with annotator dependent variance. The model is given here:

data {
  int<lower=1> J; //number of annotators
  int<lower=1> N; //number of annotations, 2*I
  int<lower=1> I; //number of items
  array[N] int<lower=1,upper=I> ii; //the item the n-th annotation belongs to
  array[N] int<lower=1,upper=J> jj; //the annotator which produced the n-th annotation
  array[N] real y; // the n-th annotation

parameters {
   array[J] real<lower=0> sigma; // Noise of annotator j
   array[I] real x; // Ground Truth Annotations

model {
  for (n in 1:N) {
    y[n] ~ normal(x[ii[n]], sigma[jj[n]]);

I sample from the posterior using default settings: 4 chains, 1000 warmup steps, 1000 sampling steps. The result of diagnosing the sample is here:

Checking sampler transitions treedepth.
5 of 4000 (0.12%) transitions hit the maximum treedepth limit of 10, or 2^10 leapfrog steps.
Trajectories that are prematurely terminated due to this limit will result in slow exploration.
For optimal performance, increase this limit.

Checking sampler transitions for divergences.
2 of 4000 (0.05%) transitions ended with a divergence.
These divergent transitions indicate that HMC is not fully able to explore the posterior distribution.
Try increasing adapt delta closer to 1.
If this doesn't remove all divergences, try to reparameterize the model.

Checking E-BFMI - sampler transitions HMC potential energy.
The E-BFMI, 0.01, is below the nominal threshold of 0.30 which suggests that HMC may have trouble exploring the target distribution.
If possible, try to reparameterize the model.

Effective sample size satisfactory.

The following parameters had split R-hat greater than 1.05:
  sigma[2], sigma[7], sigma[9], sigma[10], sigma[11], sigma[12], x[176], [... omitting]
Such high values indicate incomplete mixing and biased estimation.
You should consider regularizating your model with additional prior information or a more effective parameterization.

In addition, the effective sample sizes for some of the sigmas are in the single digits. It seems that there is some fundamental issue with the model. I tried two variations of the above model:

  • constraining the x- and y-values, with the result that all effective sample sizes are in the single digits
  • Sampling from a truncated normal in the interval [0, 5], with the result that the sampler reaches maximum treedepth at each of the 4K iterations

Now, because of the normal errors, the model’s posterior obviously can’t match the “true” process, which is on a closed interval with a concentration at the upper boundary. Can this mismatch be the cause of the problems I encountered?

In that case, I would try passing the Gaussian errors through some continuous transformation so that ground truth plus error are within the annotation interval be a reasonable solution?

Thank you for your help,


I think the current normal model could already be improved with some mild priors on x and sigma.

For x, you know that the ground truth needs to be somewhere between 0 and 5. So a normal prior with mean 2.5 and sd 1 would capture that.

For sigma, you know that it cannot be much larger than 1 to keep the annotations between 0 and 5. An exponential prior with lambda 1 is a good start.

Depending on what your goal is with the analysis, it might be worthwhile to think further about things like matching the true process and hierarchical priors. If you are mainly interested in getting reasonable estimates for x, the ground truth, I believe adding mild priors will be sufficient.