Improving Consensus Accuracy via Z-score and Weighted Voting
by Hyun Joon Jung and Matthew Lease
University of Texas at Austin
Crowdsourcing has attracted significant interest in the Information Retrieval (IR) community due to the scale of document collections on which we routinely evaluate search systems (e.g. the Web). Traditional annotation methods with in-house experts are increasingly infeasible to collect at scale, so crowdsourcing offers our community new possibilities for more tractable evaluation, as well as a variety of intriguing new directions to explore like develop hybrid automation + human computation search systems. One of the well-known challenges with crowd judgments is typically higher variance in annotation quality, for which prior work has suggested various strategies.
Along this line, our poster presents a new method for improving consensus labeling accuracy using worker filtering and weighted voting with a combination of supervised and unsupervised features. Using crowd worker judgments for graded search relevance assessment from Amazon’s Mechanical Turk, we evaluate both binary and graded accuracy improvements for different combinations of features and voting schemes.
To situate this work, we mention a few related activities happening in the IR community. The TREC Crowdsourcing Track (https://sites.google.com/site/treccrowd2011) currently underway is providing a common shared-task setting (data and task) in which to compare different methods for obtaining judgments, as well as how to combine those judgments via consensus). With a crowdsourcing tutorial and workshop at the ACM SIGIR conference in July (http://www.sigir2011.org), as well as an upcoming special issue of Springer’s Information Retrieval journal on Crowdsourcing, it is definitely an interesting time for crowdsourcing work in the IR community.
Both of the authors will be in attendance at the HComp Workshop and look forward to chatting with other attendees. We’ll send additional blog posts about our other HComp papers per the organizers’ schedule.