How to Best Serve Micro-tasks to the Crowd when there Is Class Imbalance

We study the effect on crowd worker efficiency and effectiveness of the dominance of one class in the data they process. We aim at understanding if there is any bias in workers seeing many negative examples in the identification of positive labels.

We run comparative experiments where we measure label quality and work efficiency over different class distribution settings both including label frequency (i.e., one dominant class) as well as ordering (e.g., positive cases preceding negative ones).

batch types
Order of the document classes in each batch, in blue for ‘relevant’ and red for ‘non-relevant’

We used data from TREC8.  To measure effects of class imbalance, we used two different relevant/non-relevant ratios in a batch of judging tasks: 10%-90% and 50%-50%.

When the relevant documents are shown before the non-relevant ones we obtain the highest precision, while the worst precision is obtained when they are shown at the end of the batch.
Moreover, in batch 2 we observe a low number of true positives and a large number of false positive judgments by the workers, which shows how 90% of non-relevant documents shown at the beginning of the batch create a bias in the workers’ notion of relevance.

Mean judgment accuracy, precision and recall for each setting
Mean judgment accuracy, precision and recall for each setting

When classes are balanced, there is no statistically significant difference in the performance between different orders. On the other hand, seeing a similar number of positive and negative documents leads to good performance with more than 60% accuracy in all the three order settings.

When most of the documents are non-relevant and the few relevant ones are presented first, workers perform better. This is a positive result which can be easily applied in practice as in real IR evaluation settings most of the documents to be judged are non-relevant.

Including in the first positions documents known to be relevant will both prime workers on relevance as well as allow for training.

While in a real setting it is not possible to put relevant documents first, it would still be possible to order documents by attributes indicating their relevance (e.g., retrieval rank, number of IR systems retrieving the document, etc.) thus presenting first to the workers the documents with higher probability of being relevant.


Rehab K. Qarout, Information School, University of Sheffield
Alessandro Checco, Information School, University of Sheffield
Gianluca Demartini, Information School, University of Sheffield

About the author

Alessandro Checco

Alessandro Checco graduated from the University of Rome "Tor Vergata", Italy, in 2010.
He received his Ph.D. in mathematics from the Hamilton Institute, NUI Maynooth in 2014, where his research was focused on resource allocation in wireless networks and decentralised algorithm design.
In 2015 he worked on recommender systems as a postdoctoral researcher in Trinity College, Dublin.
He is currently a postdoctoral researcher at the Information School, Sheffield University, UK, where his main research interests are web information retrieval, human computation and data privacy.

View all posts