Crowdsorcery: A Proposal for an Open-Source Toolkit Integrating Best Lessons from Industry & Academia

Want to collect data using the crowd but afraid of poor quality results? Unsure how to best design your task to ensure success? Want to use all the power of modern machine learning algorithms for quality control, but without having to understand all that math? Want a solve a complex task but unsure how to effectively piece multiple tasks together in a single automated workflow? Want to make your workflow faster, cheaper, and more accurate using the latest and greatest optimization techniques from databases and decision theory, but without having to develop those techniques yourself? Like the programmatic power of Mechanical Turk micro-tasks and the high-level expertise found on UpWork? Want to combine the best of both worlds in a seamless workflow?

We do too. In reflecting on these challenges, we’ve realized that one reason it has been difficult to solve them is due to the lack of any integrated framework for the entire crowdsourcing process that encompasses the design of workflows and UIs: the implementation and selection of optimization and quality assurance algorithms; and the design of the final task primitives that are assigned to workers on crowdsourcing platforms.

To help address this, we put together a proposal for an end-to-end, integrated ecosystem for crowdsourcing task design, implementation, execution, monitoring, and evaluation. We call it Crowdsorcery in the hope that it will take some of the magic out of designing crowdsourcing tasks (or that it would make us all crowd sorcerers). Our goal is that Crowdsorcery would enable

  • new requesters to easily create and run common crowdsourcing tasks which consistently deliver quality results.
  • experts to more easily create specialized tasks via a built-in support for rapid interactive prototyping and evaluation of alternative, complex workflows.
  • task designers to easily integrate task optimization as a core service.
  • requesters to access various populations of workers and different underlying platform functionalities in a seamless fashion
  • and researchers and practitioners to contribute latest advances in task design as plug and play modules, which can be rapidly deployed in practical applications as open-source software.

crowdsorcery_stackProposal. Crowdsorcery would implement the software stack (at right) with five key components. The arrows on left are two ways in which a requester can interface with the toolkit, either using its API programmatically or its user interface, built as a wrapper on top of the API.

Inspirations. We’ve realized that achieving any of the above five visions requires defining an integrated solution across the “crowdsourcing stack” that cuts across the user specification interface (whether through a GUI or programming language), through the optimization and primitives library, down to the actual platform specific bindings.

While existing work in research and industry have considered many of these aspects (e.g., B12’s Orchestra), no single platform or requester tool integrates all of them. For example, one popular platform lists best practices for common task types, but does not provide a way to combine these tasks into a larger workflow. Another popular platform provides a GUI for chaining together these tasks, but in rather simplistic ways that don’t take advantage of optimization algorithms. Automan (Barowy et al., 2012) is a programming language, where complex workflows combining human and machine intelligence can be easily coded (see the Follow the Crowd post on Automan), but it locks the user into default optimization approaches, and does not surface platform specific bindings needed for requester customization. We also do not know of any existing tools that can seemlessly pool together workers from different marketplaces.

Crowdsorcery Software Stack

  • Platform-specific bindings. At the bottom of the software stack, the Platform-specific bindings layer  will enable Crowdsourcery to run on diverse worker platforms, such as Mechanical Turk, Upwork, and Facebook (e.g. to facilitate friendsourcing). This layer encapsulates specifics of each platform and abstracts away such details from the higher layers.
  • Primitives. Above this, the Primitives layer will encompass a pre-built library of atomic primitives, such as “binary choice”, “rate an item”, “fill in the blank”, “draw a bounding box”, etc. These will form the basic building blocks in which all crowdsourcing tasks will be composed. Furthermore, more complex primitives can be hierarchically architected from atomic primitives. For example, a “sort a list of items” primitive could combine rating and binary choice primitives with appropriate some control logic.
  • Optimization. A key focus of Crowdsorcery is providing rich support for optimization, implemented in the next layer up. Crowdsorcery integrates underlying task optimization as a core service and capability, providing a valuable separation of concerns for task designers and enabling them to benefit as methods for automatic task optimization continue to improve over time.
  • Programming API. Continuing up the software stack, Crowdsorcery’s Programming API will provide an environment for an advanced requester to quickly prototype a complex workflow combining existing and new primitive task types. Existing optimization routines could help with parameter optimization. Advanced users would be able to access the logic in these routines, and retarget/reimplent them for their specific use case.
  • GUI. Finally, the GUI layer will provide a wrapper on top of the programming API for the lay requesters, which will hide many technical details, but will expose interface for execution monitoring of running tasks.

While research and industry solutions have been proposed for each of the above layers, they have typically addressed each layer in isolation. No single platform or requester tool integrates all of them today. This means that it is virtually impossible for (1) novice requesters to ever take advantage of optimization libraries and workflows, (2) optimization libraries to be used in practical settings which would necessarily require worker interfaces, or (3) best practices to be integrated into primitives for workflows. CrowdSorcery’s end-to-end toolkit will enable these novel possibilities in an effective and user-friendly manner. Its open-source nature will allow distributed maintenance and incorporation of latest developments into the toolkit rapidly.

What’s the next step? In this blog post, our goal is simple: consider all aspects of crowdsourcing task design and create a framework that integrates them together. In retrospect, the software stack we came up with is pretty obvious. Our hope is that this can be the starting point for a more detailed document detailing specific research directions in each of these domains (as related to the entire stack), and ultimately, for a crowdsourcing compiler (see the great 2016 Theoretical Foundations for Social Computing Workshop Report) or IDE which takes the magic out of crowdsourcing.

Crowdsorcery Team
Aditya Parameswaran (UIUC)
David Lee (UC Santa Cruz)
Matt Lease (UT Austin)
(IIT Delhi & U. Washington)

Crowdsorcery was one of the group projects pursued at the CMO-BIRS 2016 WORKSHOP ON MODELS AND ALGORITHMS FOR CROWDS AND NETWORKS.


About the author

Matthew Lease

Matthew Lease is an Associate Professor at the University of Texas at Austin. He has presented invited keynote talks on crowdsourcing at IJCNLP 2011 and the 2012 Frontiers of Information Science and Technology (FIST) meeting, as well as crowdsourcing tutorials at a variety of conferences (SIGIR 2011, CrowdConf 2011, SIGIR 2012, and SIAM Data Mining 2013). For three years (2011-2013), Lease organized a series of community evaluations to benchmark crowdsourcing methods for National Institute of Standards and Technology (NIST) Text Retrieval Conference (TREC). Lease received the 2012 Modeling Challenge award at the 5th International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction, a 2015 Samsung Samsung Human-Tech Paper Award, and a Best Paper award at HCOMP 2016.

View all posts


  • Hi Aditya et al!

    We here at UMass Amherst completely agree with your mission. Crowdsourcing is unnecessarily complicated, and there has been a lot of duplicated effort building research prototypes. Taking a language/framework approach is a great idea because we already know how to design complex, interrelated, and modular code in programming languages. We believe that the most important research questions revolve around dealing with crowdsourcing’s necessary complexities. Wouldn’t it be great if innovators in this area could just focus on their new idea instead of implementing MTurk REST APIs again for the zillionth time?

    Our platform, AutoMan (, was designed specifically with this goal in mind. We recognized early on that even though other algorithms and crowdsourcing platforms would inevitably emerge, the core abstractions need not change. So we must take issue with one point you make, that AutoMan “locks the user into default optimization approaches, and does not surface platform specific bindings needed for requester customization.” Neither of these claims are true. AutoMan was designed from day one to enable precisely the extensibility you propose. It is perhaps the one feature that required the greatest amount of effort from us. We discuss our architecture–which is very similar to yours–in our original AutoMan paper (; see section 6 “System Architecture and Implementation”).

    tl;dr: The user only needs to learn the surface DSL; policies and backends can be swapped with others, largely without altering user code. For example, here ( is an interface for wage calculations, and here ( is an implementation for a simple price model. As a plus, AutoMan integrates directly with ordinary Scala/Java programs, because it’s just an ordinary Scala library (available via Maven, even:

    We recently used these very extensibility mechanisms to add new kinds of tasks and new quality control (i.e., “optimization”) procedures that significantly extend AutoMan’s expressiveness. For example, here is a simple app that asks checkbox questions from our original paper ( and here is a new app that does calorie estimates from photos ( These two tasks require very different quality control algorithms, worker recruitment procedures, etc. Nonetheless, the user syntax is quite similar and it turns out that both entail a lot of similar machinery. Those commonalities are precisely what AutoMan was designed to exploit.

    We’ve even been working with a Google Summer of Code student to add high-level (i.e., graphical) monitoring and debugging tools to complement AutoMan. See here (

    Building this kind of architecture entails a lot of tricky (and largely non-publishable) engineering work. This is work that we have already done. Your ideas about high-level GUI tools and low-level primitives are interesting and novel, and I would love to see them happen. Why not build them on top of AutoMan? I would be thrilled if somebody implemented their next crowdsourcing idea with our software. I will happily discuss (or even collaborate!) with people interested in doing this, and I will happily accept pull requests for features. I understand that our documentation is a bit sparse in places, but feel free to ping me if you’re confused about something and I will happily rectify the situation.

    • One of the challenges we’ve been discussing a lot at HCOMP16 is that, in order for the field to progress, we need stable frameworks that we can standardize around and that are sufficiently general to launch a lot of different work.

      Dan, I applaud you for doing the hard engineering work to bring Automan up to this level. Are people building around it?