Improving Consensus Accuracy via Z-score and Weighted Voting

Improving Consensus Accuracy via Z-score and Weighted Voting
by Hyun Joon Jung and Matthew Lease
University of Texas at Austin
http://www.ischool.utexas.edu/~ml/papers/jung-hcomp11.pdf
Crowdsourcing has attracted significant interest in the Information Retrieval (IR) community due to the scale of document collections on which we routinely evaluate search systems (e.g. the Web). Traditional annotation methods with in-house experts are increasingly infeasible to collect at scale, so crowdsourcing offers our community new possibilities for more tractable evaluation, as well as a variety of intriguing new directions to explore like develop hybrid automation + human computation search systems.  One of the well-known challenges with crowd judgments is typically higher variance in annotation quality, for which prior work has suggested various strategies.

Along this line, our poster presents a new method for improving consensus labeling accuracy using worker filtering and weighted voting with a combination of supervised and unsupervised features. Using crowd worker judgments for graded search relevance assessment from Amazon’s Mechanical Turk, we evaluate both binary and graded accuracy improvements for different combinations of features and voting schemes.

To situate this work, we mention a few related activities happening in the IR community. The TREC Crowdsourcing Track (https://sites.google.com/site/treccrowd2011) currently underway is providing a common shared-task setting (data and task) in which to compare different methods for obtaining judgments, as well as how to combine those judgments via consensus).  With a crowdsourcing tutorial and workshop at the ACM SIGIR conference in July (http://www.sigir2011.org), as well as an upcoming special issue of Springer’s Information Retrieval journal on Crowdsourcing, it is definitely an interesting time for crowdsourcing work in the IR community.

Both of the authors will be in attendance at the HComp Workshop and look forward to chatting with other attendees. We’ll send additional blog posts about our other HComp papers per the organizers’ schedule.

Improving Consensus Accuracy via Z-score and Weighted Voting
> 	by Hyun Joon Jung and Matthew Lease
> 	University of Texas at Austin
> 	
> http://www.ischool.utexas.edu/~ml/papers/jung-hcomp11.pdf
> 
> Crowdsourcing has attracted significant interest in the Information Retrieval (IR) community due to the scale of document collections on which we routinely evaluate search systems (e.g. the Web). Traditional annotation methods with in-house experts are increasingly infeasible to collect at scale, so crowdsourcing offers our community new possibilities for more tractable evaluation, as well as a variety of intriguing new directions to explore like develop hybrid automation + human computation search systems.  One of the well-known limitations with crowd judgments is typically higher variance in quality, for which a wide variety of prior work has considered strategies for addressing.
> 
> Along this line, our poster presents a new method for improving consensus labeling accuracy using worker filtering and weighed voting with a combination of supervised and unsupervised features. Using crowd worker judgments for graded search relevance assessment from Amazon's Mechanical Turk, we evaluate improvements for both binary and graded accuracy for different combinations of features and voting schemes.
> 
> The TREC Crowdsourcing Track (https://sites.google.com/site/treccrowd2011) currently underway is providing the community with a valuable shared-task setting (data and task) in which to compare different methods for obtaining judgments, as well as how to combine those judgments via consensus).  With a crowdsourcing tutorial and workshop at the ACM SIGIR conference in July (http://www.sigir2011.org), as well as an upcoming special issue of Springer's Information Retrieval journal on Crowdsourcing, it is definitely an exciting time for crowdsourcing work in the IR community.
> 
> Both of the authors will be in attendance at the HComp Workshop and look forward to chatting with everyone else who is able to attend. We'll send additional blog posts about our other HComp papers per the organizers' schedule.
> 

About the author

Matthew Lease

Matthew Lease is an Associate Professor at the University of Texas at Austin. He has presented invited keynote talks on crowdsourcing at IJCNLP 2011 and the 2012 Frontiers of Information Science and Technology (FIST) meeting, as well as crowdsourcing tutorials at a variety of conferences (SIGIR 2011, CrowdConf 2011, SIGIR 2012, and SIAM Data Mining 2013). For three years (2011-2013), Lease organized a series of community evaluations to benchmark crowdsourcing methods for National Institute of Standards and Technology (NIST) Text Retrieval Conference (TREC). Lease received the 2012 Modeling Challenge award at the 5th International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction, a 2015 Samsung Samsung Human-Tech Paper Award, and a Best Paper award at HCOMP 2016.

View all posts