A typical crowdsourcing classification scenario is where we wish to classify a number of items based on a set of noisy or biased labels that were provided by multiple crowd workers with varying levels of expertise, skills and attitudes. To obtain the set of accurate aggregated labels, we must be able to assess the accuracies and biases of each worker who contributed labels. Ultimately, these estimates of the workers’ accuracy should be integrated within the process that infers the items’ true labels.
Prior work on the data aggregation problem in crowdsourcing led to an expressive representation of a worker’s accuracy in the form of a latent worker confusion matrix. This matrix expresses the probability of each possible labelling outcome for a specific worker conditioned on each possible true label of an item. This matrix reflects the labelling behaviour of a given user, who may, for example, be biased towards a particular label range. See the example below for a classification task with three label classes (-1,0,1).
In CommunityBCC, we make a further modelling step by adding a latent worker type variable, which we call community. Communities represent similarity patterns among the workers’ confusion matrices. Thus, we assume that the workers’ confusion matrices are not completely random, but rather that they tend follow some underlying clustering patterns – such patterns are readily observable by plotting the confusion matrices of workers as learned by BCC. See this example from a dataset with three-point scale labels (-1, 0, 1):
The CommunityBCC model is designed to encode the assumptions that (i) the crowd is composed by an unknown number of communities, (ii) each worker belongs to one of these communities and (iii) each worker’s confusion matrix is a noisy copy of their community’s confusion matrix. The factor graph of the model is shown below and the full generative process is described in the paper (details below).
How to find the number of communities
For a given dataset, we can find the optimal number of communities using standard model selection. In particular, we can perform a model evidence search over a range of community counts. So, if we assume that the community count lies within a range of 1..x communities, we can run CommunityBCC by looping over this range and compute the model evidence of each community count. This computation can be done efficiently using approximate inference using message passing. For an example, take a look at computing model evidence for model selection using the Infer.NET probabilistic programming framework here.
We tested our CommunityBCC model on four different crowdsourced datasets and our results show that it provides a number of advantages over BCC, Majority Voting (MV) and Dawid and Skene’s Expected Maximization (EM) method.
- CommunityBCC converges faster to the highest classification accuracy using less labels. See the figure below where we iteratively select labels for each dataset.
- The model provides useful information about the number of latent worker communities. See the figure below showing the communities and the percentage of workers estimated by CommunityBCC in each of the four datasets.
To learn more about Community-Based Bayesian Aggregation Models for Crowdsourcing, take a look at the paper:
Matteo Venanzi, John Guiver, Gabriella Kazai, Pushmeet Kohli, and Milad Shokouhi, Community-Based Bayesian Aggregation Models for Crowdsourcing, in Proceedings of the 23rd International World Wide Web Conference, WWW2014, Best paper runner up, ACM, April 2014
Full code for this model
The full C# implementation of this model is described in this post where you can download and try out its Infer.NET code. You are welcome to experiment with the model and provide feedback.
Matteo Venanzi, University of Southampton
John Guiver, Microsoft
Gabriella Kazai, Microsoft
Pushmeet Kohli, Microsoft
Milad Shokouhi, Microsoft