Social Sampling and the Multiplicative Weights Update Method

by Elisa Celis, Peter Krafft and Nisheeth Vishnoi

Understanding when social interactions lead to the emergence of group-level abilities beyond those of an individual is central to understanding human collaboration and collective intelligence. Towards this, in this post we consider a social behavior that is at once conspicuous in daily life, oft discussed in the social science literature, and also recently empirically verified: the behavior essentially boils down to taking suggestions from other people, and we study whether this leads to good decision-making as a whole.

For instance, consider the problem of choosing what stocks to purchase, or which restaurant to go to. From day to day the attractiveness of each option available might change, but overall may tend to vary around some mean. In such cases, there can be a large number of options for an individual to choose from, and it is impossible to hear about all the past experiences that individuals have had with each option. One simple strategy in these situations is to first seek out a recommendation, whether from friends or via web searches, and then evaluate the current information available about the recommended option.  In doing so, no individual has to consider all the different choices themselves, so the process is cognitively simple.

This social behavior can be broken down as follows:

  • people look to others for advice about what decisions to make,
  • privately gather further information about the recommended options,
  • and then make their final decisions.

We study a concrete model of this process called “social sampling” that was  recently formulated and validated empirically using a large behavioral dataset. More precisely, as depicted in the figure below, in every decision-making round each individual first chooses an option to consider by looking at the current decision of a random other person, then the current (stochastic) quality of each option is observed, and finally the individual randomly either keeps this option or remains undecided where the probability of keeping it is determined by the quality.

social-sampling

From the perspective of a single individual, social sampling is a simple heuristic, and it requires very limited cognitive overhead. As a whole, it is a priori unclear whether this process will result in society eventually converging to the best option, or in inferior options gaining popularity by being propagated from one person to the next.

Nevertheless, our results show that a group of individuals implementing social sampling in a diverse range of settings results in approximately optimal decisions, and hence the behavior is collectively rational. Our analysis shows that social sampling is a highly effective distributed algorithm for solving the problem of which decision is best to make, in the sense that social sampling achieves near-optimal regret in this sequential decision-making task. This behavioral mechanism that individuals use may therefore be highly effective in large groups.

Key to our results is the observation that social sampling can be viewed as a distributed implementation of the ubiquitous multiplicative weights update (MWU) method in which the popularity of each option implicitly represents the weight of that option. This relationship provides an algorithmic lens through which we can understand the emergent collective behavior of social sampling.  Beyond these scientific implications, the relationship to MWU could also suggest novel distributed MWU algorithms. Social sampling requires little communication and memory, and hence may be appropriate as a MWU algorithm for low-power devices such as sensor networks or the internet-of-things.

Social Sampling was one of the group projects pursued at the CMO-BIRS 2016 WORKSHOP ON MODELS AND ALGORITHMS FOR CROWDS AND NETWORKS.

Collaboration Among Workers

Chien-Ju Ho (Cornell University), Christopher H. Lin (University of Washington), and Siddharth Suri (Microsoft Research)

The research question we sought out to address in this pilot study is:

Does the fact that workers exist in networks help them solve problems?

To address this question, we gave 100 Mechanical Turk workers a list of 8 cities in the United States and asked them to find the shortest route that visits all cities and starts from Seattle. This is an instance of the Travelling Salesman Problem (TSP).  Here were the HIT parameters:

  • The base pay was $0.10 and workers got a $2.00 bonus for getting the best (or tied for the best) route.
  • In addition, for every 100 miles away from the best answer we deducted $0.10 from the maximum bonus.
  • We set the duration of the HIT to be1 hour.
  • We explicitly said workers could collaborate on this.

One of our main results is that we found worker collaboration!  Next, we show how they collaborated.

  • Workers started new threads to collaborate on the HIT.startthread
  • Workers linked to the new threads in the main forum threads.link
  • Workers tried to fill out a matrix of distances between cities.matrix
  • Workers shared their answers (routes).map
  • Workers did greedy minimization to improve on the answers.
    improvement

Conclusions

We found some preliminary evidence that workers can use their own networks to collaborate. Moreover, they can solve hard problems together if the requester explicitly allows collaboration and incentivizes them to do so.

We are currently working on a number of follow up questions:

  • How do solutions from collaboration compare to solutions from independent workers?
  • Could allowing workers to collaborate result in group think?
  • For which problems does collaboration help and for which problems does it not help?

 

Collaboration Among Workers was one of the group projects pursued at the CMO-BIRS 2016 WORKSHOP ON MODELS AND ALGORITHMS FOR CROWDS AND NETWORKS.

Designing More Informative Reputation Systems

by Yiling Chen, Jason Hartline, Yang Liu, Bo Waggoner & Dan Weld

The role of reputation systems in online markets, such as those for crowdwork, is to transform a single-shot game between individual requester and workers into a repeated game between the population of requesters and population of sellers. In the single-shot game cooperation can break down, e.g. workers may provide only low quality service, but in the repeated game cooperation can be sustained, and workers are incented to do high quality work (cf. Kandori, 1992).  Reputation systems will fail, however, if the the marginal cost to a requester for providing an informative review of a worker is greater than the marginal benefit of more accurate future ratings.1
  The main driver of this inequality is the free-rider problem: requesters benefit from the public good (i.e. worker reputation) created by the reputation system even if they do not contribute to it.  

In this blog post we describe two main ideas for making reputation systems more effective.  First, to increase the marginal benefit of providing an accurate rating we suggest changing reputation from a public good to a private good by casting it as an individualized (to each requester) recommendation.  For example, a requester who gives all workers a top rating is reporting no preference and can be recommended any worker; while a requester with more informative reports will be assigned workers that the system predicts will be preferred by this requester.  Second, to decrease the marginal cost of providing uninformative reviews, we suggest linking decisions between a requester’s review of several workers (i.e., ranking the workers relative to each other instead of scoring them absolutely).   

The rest of this blog post is organized as follows.  In the first section we will discuss some of the reasons that marginal cost of informative reviews may outweigh their marginal benefit.  In the second section we describe how revisioning the reputation system as a recommender system. In the third section we describe how linking decisions can make it easier for requesters to give informative reports.

1. Costs and Benefits of Reputation systems.

We believe that ratings are inaccurate because the marginal cost to a requester for providing an informative review is greater than the marginal benefit of more accurate future ratings. There are several costs associated with submitting a review, especially an accurate assessment of a poor worker.

  • It takes time to enter a review.
  • A review incurs the risk of being embroiled in a dispute
  • A negative review may give the requester the reputation of being a ‘harsh grader’ and cause other skilled workers to avoid the requester for fear of damaging their reputation.
  • There may be off-platform consequences, such as unfavorable posts on Turkopticon.

By lowering these costs, as we discuss below, an improved reputation system could increase the accuracy of reviews.  There are also several reasons why the marginal benefit of a review is low:

  • Requester feedbacks are often aggregated into reputations and a single assessment will likely not visibly affect the worker’s rating.  
  • Even if a single assessment did change the worker’s score, the change provides no new information to the requester, who already knows how this worker performed. (If the rating affected the scores of other workers, however, by personalizing these ratings, then there would be marginal benefit. (Such personalized ratings may be thought of as a private good).

There are several ways to decrease the cost of providing truthful reviews:

  • One can reduce the chance of disputes by making reviews anonymous and only showing a worker their average score. This approach is taken by Uber (and others) who only show drivers their average rating for the most recent 500 trips.
  • One way to shrink or eliminate the marginal cost of the time to submit a review is to provide a simple UI which is fast while simultaneously making the process of avoiding the review task cumbersome. This approach, and Amazon’s repeated nagging requests for reviews may alienate users, however.  
  • Alternatively, in lieu of an explicit review, it may be possible to observe other actions performed by the requester and use that to reveal his or her preferences. For example, one could detect if the requester chooses to subsequently hire the worker, which is presumably a very strong signal that previous interactions were positive.

One can also try to eliminate the free-rider problem by increasing the marginal benefit of reviews.  Again, several options are possible.

  • One way of increasing the marginal value is to personalize recommendations. This method is used in Boomerang by Gaikwad et al. (2016) and we discuss it further in the next section.
  • Another method for increasing marginal value of correct reviews is changing overall system behavior in a way that impacts the requester. For example, Boomerang gives temporal priority to well-rated workers on subsequent jobs posted by a requester. This creates a clear disincentive to inflate the grade of a poor worker, but may not provide enough of an incentive for requesters to post ratings at all.

One novel approach that deserves more attention is blurring. In this model, requesters ability to observe the reputation of a worker is proportional to the number and quality of ratings that have provided.  If the system supports strong identities then a new requester could be allowed unfettered access to the reputation ratings, but if the requester engaged in transactions and then failed to report an accurate rating, the system would hide information in subsequent rounds (see Figure).  Of course such a mechanism requires a way to estimate the quality of a review.  One method, which needs further thought and experimental evaluation, would be to measure the entropy of the requester’s reviews (this discourages giving uniformly positive 5 star reviews) and agreement with good reviewers (calculated using EM). l-

2. Reputation systems as recommender systems.

One approach to solving the free rider problem is to turn the public good into private goods. If reputation of workers is no longer something that every requester can benefit from equally, instead, each requester gets personalized recommendations or matches of workers according to preferences revealed by the provided feedbacks, then the marginal value of providing an accurate assessment is increased. This alleviates the incentive to free ride. For example, if a requester always rates five stars, his revealed preference is that he’s happy with any worker and he hence will be matched with workers that others think are less skilled. This approach not only provides incentives for requesters to report accurate assessments but also accommodates heterogeneity in requester preferences, which can be helpful for increasing expressiveness of reputation systems. In such a system a requester’s feedback is taken not only as an assessment but also as an indication of his preference.  The following gives a sketch of how such a system would work.

  • Requesters provide reviews of workers in their own terms.
  • The requesters’ reviews are normalized and aggregated.
  • The aggregated reviews are reinterpreted into an individual recommendations for each requester (according to the requester’s preference inferred from (1)).

There are two key properties of such a system.  First, if a requester gives uninformative reviews, e.g., all five-star ratings, then the requester gets back uninformative recommendations, e.g., all five-star ratings.  Second, uninformative reviews such as all five-star ratings when normalized and aggregated will not dilute the informative reviews (which would make the aggregate reviews less informative).

The Netflix recommendation system is a canonical example.   Users enjoy recommendations based on data contributed from others and their own reviews are used in collaborative filtering to generate personalized recommendations (cf. Ekstrand et al., 2011). Such an approach is based on the idea of decomposing a large while sparse rating matrix to three components: R(rating) = U(Users) Sigma(latent variables) M(movies), namely users, latent variable, and movies. Then we see the above procedure helps identify hidden similarity features that any two users (rating providers) may share. Therefore it is to a user’s best interest (improve accuracy of recommendation) to truthfully reveal his preference.

illu_1

Collaborative filtering based movie recommendation

illu_2Collaborative filtering based worker recommendation

Collaborative filtering is a powerful tool to identify the most representative latent dimensions that can best describe the rating matrix. This decomposition to a great extent helps to characterize useful attributes that can best “predict” a future rating (or reputation). While collaborative filtering methods like the one discussed above do not provide a natural labeling of the latent factors, it would perhaps be interesting to combine collaborative filtering with more sophisticated data mining approaches to obtain natural labeling for discovered latent dimensions.

Another benefit of using a collaborative filtering approach is that it can improve the expressiveness of a reputation system. So far most reputation systems rely on a single (and often naive) dimension for scoring agents, e.g., accomplishment rate for AMT. But there exists potentially many other dimensions that help determine such a “reputation score”.

Naturally the above procedure faces challenges that classical recommender systems have. First, the recommendation-based reputation system need to deal with the cold-start problem (cf. Sedhain et al., 2014), when a new worker arrives. This to a certain degree may discourage new users. Second, the accuracy of the above approach depends on various modeling assumptions. When a model works and when it doesn’t are interesting questions for future studies.

2.1 Theoretical Approaches

We provide some initial thoughts on how one can formally approach reputation systems as a kind of recommender systems.

A first question to solve is an offline learning problem: Given reviews acquired so far (say requesters reviewing workers), how can we predict which matches of requesters to workers would be valuable in the future? One could cast this as a matrix completion kind of problem and apply collaborative filtering techniques to predict the “missing entries”, i.e. reviews that would be given for unknown matches (cf. Koren et al., 2009). Another approach would be to apply techniques from crowdsourcing for inferring underlying parameters given responses on tasks (cf. Moon ,1996; Karger et al., 2014). If we can infer parameters for reviewer and worker preferences and skills, perhaps we can use these for prediction.

A next step is to make this problem dynamic. A system obtains new reviews over time and can make or influence future assignments of workers to requesters. There is an explore-exploit tradeoff because the system may benefit in the long run from initially making suboptimal assignments in order to learn about skills and preferences.  (In some systems an initial questionnaire could directly elicit preferences.)  Here, an interesting challenge is to combine algorithmic approaches such as bandit learning or active learning with the above inference algorithms (cf. Bresler et al., 2015).

Finally, further study of the incentives of recommender systems is warranted.  Such a study would need to explicitly model for the utility of a requester in terms of the accuracy of reviews provided and recommendations received.  See (cf. Jurca and Faltings, 2003; Dasgupta and Ghosh, 2013) for initial work in this area.

3. Feedback Elicitation

Another approach to reversing the direction of the marginal cost greater than marginal value inequality is to increase the informativeness of elicited feedbacks while maintaining or lowering the marginal cost of providing informative feedback.

In many reputation systems, users are asked to provide a numerical rating or score for sellers or service providers that they have interacted with.  If there is no option to opt out of providing a rating, arguably the strategy of always giving a five-star rating without regard to the actual experience is as costless as possible for the users. This strategy however leads to completely uninformative ratings that defies the purposes of eliciting feedbacks.

An alternative approach would be to ask requestors to rank order of the past three workers that he has interacted with. The requester no longer has the option of saying that all workers are excellent and hence is more likely to provide a ranking that’s closer to his true experiences.  
ranking-ui

There are two reasons why solicitation of rankings may lead to better outcomes than solicitation of scores.   The first reason is that humans find ranking easier than scoring (Miller, 1956, 1994).  The second reason is that requesters may differ in their perception of the magnitudes of the qualities of workers or may prefer to exaggerate the quality of workers to avoid retribution or costly disputes.  For example, Frankel (2014) studied a related delegation problem and showed that under natural assumptions soliciting rankings information is optimal when requesters would otherwise have incentives to misreport scores.  More generally, the approach of ranking is related to the “linking decisions” idea in economics (cf. Jackson and Sonnenschein, 2007).  When requesters are asked to score workers, they make individual decisions on each worker; but when requesters rank workers, their decisions are linked.

Changing the way to ask for feedbacks also brings up interesting interface questions. While conceivably it may be easier for a requester to compare two workers than score them, how about three workers or five workers? Time also adds additional complications as people may not remember their past experiences well. We think there is an interesting research agenda here trying to understand the impact of interface design on the informativeness vs. cost tradeoff for eliciting feedbacks.  The literature on ranking in peer grading may be a useful starting point (cf. Raman and Joachims, 2014).

1 For simplicity of exposition, we speak exclusively of a worker’s reputation, but our ideas also apply to the equally important problem of requestor reputations.

Designing More Informative Reputation Systems was one of the group projects pursued at the CMO-BIRS 2016 WORKSHOP ON MODELS AND ALGORITHMS FOR CROWDS AND NETWORKS.

References:

  1. Kandori, Michihiro. “Social norms and community enforcement.” The Review of Economic Studies 59.1 (1992): 63-80.
  2. Horton, John J., Joseph M. Golden, Reputation Inflation: Evidence from an Online Labor Market, 2015. http://econweb.tamu.edu/common/files/workshops/Theory%20and%20Experimental%20Economics/2015_3_5_John_Horton.pdf.
  3. S.S. Gaikwad, D. Morina, A. Ginzberg, C. Mullings, S. Goyal, D. Gamage, C. Diemert, M. Burton, S. Zhou, M. Whiting, K. Ziulkoski, A. Ballav, A. Gilbee, S.S. Niranga, V. Sehgal, J. Lin, L. Kristianto, A. Richmond-Fuller, J. Regino, N. Chhibber, D. Majeti, S. Sharma, K. Mananova, D. Dhakal, W. Dai, V. Purynova, S. Sandeep, V. Chandrakanthan, T. Sarma, S. Matin, A. Nassar, R. Nistala, A. Stolzoff, K. Milland, V. Mathur, R. Vaish, and M.S. Bernstein (2016) Boomerang: Rebounding the Consequences of Reputation Feedback on Crowdsourcing Platforms. To appear in UIST-16.
  4. Ekstrand, Michael D., John T. Riedl, and Joseph A. Konstan. “Collaborative filtering recommender systems.” Foundations and Trends in Human-Computer Interaction 4.2 (2011): 81-173.
  5. Suvash Sedhain, Scott Sanner, Darius Braziunas, Lexing Xie, and Jordan Christensen. 2014. Social collaborative filtering for cold-start recommendations. In Proceedings of the 8th ACM Conference on Recommender systems (RecSys ’14). ACM, New York, NY, USA, 345-348. DOI=http://dx.doi.org/10.1145/2645710.2645772
  6. Koren, Yehuda, Robert Bell, and Chris Volinsky. “Matrix factorization techniques for recommender systems.” Computer 42.8 (2009): 30-37.
  7. Moon, Todd K. “The expectation-maximization algorithm.” IEEE Signal processing magazine 13.6 (1996): 47-60.
  8. Karger, David R., Sewoong Oh, and Devavrat Shah. “Budget-optimal task allocation for reliable crowdsourcing systems.” Operations Research 62.1 (2014): 1-24.
  9. Bresler et al. “Regret Guarantees for Item-Item Collab FIltering” http://arxiv.org/abs/1507.05371 applies bandit algorithm to choose what rating to ask for
  10. Jurca, Radu, and Boi Faltings. “An incentive compatible reputation mechanism.” E-Commerce, 2003. CEC 2003. IEEE International Conference on. IEEE, 2003.
  11. Dasgupta, Anirban, and Arpita Ghosh. “Crowdsourced judgement elicitation with endogenous proficiency.” Proceedings of the 22nd international conference on World Wide Web. ACM, 2013.
  12. Jackson and Sonnenschein (2007), Overcoming Incentive Constraints by Linking Decisions. Econometrica.
  13. Frankel (2014), Aligned Delegation. American Economic Review.
  14. Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological review, 63(2), 81.
  15. Miller, G. A. (1994). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological review, 101(2), 343.
  16. Raman, K., & Joachims, T. (2014, August). Methods for ordinal peer grading. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1037-1046). ACM.

WearMail

Email search is often difficult for tasks such as:

  • What is my flight confirmation number?
  • What is my ACM member number?
  • Where is my meeting with Dan?
  • Was there an event I was supposed to go to today?
  • What deadlines do I have coming up?

Additionally, often we need answers to these questions on the go, such as when we’re in a taxi to the airport.

WearMail System

We built WearMail – a system where you can speak to your watch, and the watch will search your inbox. When the user requests specific types of information, such as flight confirmation numbers, and it triggers a special search that returns only the specific data.

jeffwearmail1-sm jeffwearmail2-sm

Currently, WearMail works on any AndroidWear and will search GMail using the API provided by Google.

Crowd Constructed Queries

We deployed two surveys in order to determine how well the crowd is able to generate useful Gmail queries based on natural language queries from the watch. In the first survey, we asked both workers on Amazon Mechanical Turk and workshop attendees to provide keyword search terms for three questions:

  1. “What is my Delta flight confirmation for today?”
  2. “I want to find my ACM Membership Number in my email.”
  3. “What room was I supposed to meet Dan Weld in today?”

Overall, both groups did reasonably well in constructing queries from the example questions, although most simply used queries from the original questions, e.g., “Meeting Dan Weld”, “ACM Membership”, “Delta confirmation”. Some workers tried to add additional information, such as today’s date. Workshop members were able to add a bit of additional expertise in formulating their queries, especially for the ACM Membership number. One query included the word “registration” and the other included the word “renewal,” presumably because workshop attendees thought these keywords would find those emails where the membership number was most likely to be mentioned.

Interfaces for Crowds to Create Search Patterns

We also asked survey participants to provide information that could be useful for constructing regular expression queries, both in terms of minimum and maximum range values and in terms of whether the target terms contained numbers, letters, or a combination of both. The results were largely inconsistent, but a preliminary interface for this approach is shown in the figure below. As a result, we hypothesized that a more promising approach may be to ask workers to find examples of the target terms on the internet, and to generalize from those. This worked reasonably for some — you can find examples of flight confirmation numbers, license plates, and room numbers. But, workers could not find other examples, such as ACM membership numbers. With our current UI, we had mixed success in getting workers to generalize the examples they found to other examples that could be reasonable.

Screen Shot 2016-09-01 at 6.16.24 PM

 

WearMail was one of the group projects pursued at the CMO-BIRS 2016 WORKSHOP ON MODELS AND ALGORITHMS FOR CROWDS AND NETWORKS.

Crowdsorcery: A Proposal for an Open-Source Toolkit Integrating Best Lessons from Industry & Academia

Want to collect data using the crowd but afraid of poor quality results? Unsure how to best design your task to ensure success? Want to use all the power of modern machine learning algorithms for quality control, but without having to understand all that math? Want a solve a complex task but unsure how to effectively piece multiple tasks together in a single automated workflow? Want to make your workflow faster, cheaper, and more accurate using the latest and greatest optimization techniques from databases and decision theory, but without having to develop those techniques yourself? Like the programmatic power of Mechanical Turk micro-tasks and the high-level expertise found on UpWork? Want to combine the best of both worlds in a seamless workflow?

We do too. In reflecting on these challenges, we’ve realized that one reason it has been difficult to solve them is due to the lack of any integrated framework for the entire crowdsourcing process that encompasses the design of workflows and UIs: the implementation and selection of optimization and quality assurance algorithms; and the design of the final task primitives that are assigned to workers on crowdsourcing platforms.

To help address this, we put together a proposal for an end-to-end, integrated ecosystem for crowdsourcing task design, implementation, execution, monitoring, and evaluation. We call it Crowdsorcery in the hope that it will take some of the magic out of designing crowdsourcing tasks (or that it would make us all crowd sorcerers). Our goal is that Crowdsorcery would enable

  • new requesters to easily create and run common crowdsourcing tasks which consistently deliver quality results.
  • experts to more easily create specialized tasks via a built-in support for rapid interactive prototyping and evaluation of alternative, complex workflows.
  • task designers to easily integrate task optimization as a core service.
  • requesters to access various populations of workers and different underlying platform functionalities in a seamless fashion
  • and researchers and practitioners to contribute latest advances in task design as plug and play modules, which can be rapidly deployed in practical applications as open-source software.

crowdsorcery_stackProposal. Crowdsorcery would implement the software stack (at right) with five key components. The arrows on left are two ways in which a requester can interface with the toolkit, either using its API programmatically or its user interface, built as a wrapper on top of the API.

Inspirations. We’ve realized that achieving any of the above five visions requires defining an integrated solution across the “crowdsourcing stack” that cuts across the user specification interface (whether through a GUI or programming language), through the optimization and primitives library, down to the actual platform specific bindings.

While existing work in research and industry have considered many of these aspects (e.g., B12’s Orchestra), no single platform or requester tool integrates all of them. For example, one popular platform lists best practices for common task types, but does not provide a way to combine these tasks into a larger workflow. Another popular platform provides a GUI for chaining together these tasks, but in rather simplistic ways that don’t take advantage of optimization algorithms. Automan (Barowy et al., 2012) is a programming language, where complex workflows combining human and machine intelligence can be easily coded (see the Follow the Crowd post on Automan), but it locks the user into default optimization approaches, and does not surface platform specific bindings needed for requester customization. We also do not know of any existing tools that can seemlessly pool together workers from different marketplaces.

Crowdsorcery Software Stack

  • Platform-specific bindings. At the bottom of the software stack, the Platform-specific bindings layer  will enable Crowdsourcery to run on diverse worker platforms, such as Mechanical Turk, Upwork, and Facebook (e.g. to facilitate friendsourcing). This layer encapsulates specifics of each platform and abstracts away such details from the higher layers.
  • Primitives. Above this, the Primitives layer will encompass a pre-built library of atomic primitives, such as “binary choice”, “rate an item”, “fill in the blank”, “draw a bounding box”, etc. These will form the basic building blocks in which all crowdsourcing tasks will be composed. Furthermore, more complex primitives can be hierarchically architected from atomic primitives. For example, a “sort a list of items” primitive could combine rating and binary choice primitives with appropriate some control logic.
  • Optimization. A key focus of Crowdsorcery is providing rich support for optimization, implemented in the next layer up. Crowdsorcery integrates underlying task optimization as a core service and capability, providing a valuable separation of concerns for task designers and enabling them to benefit as methods for automatic task optimization continue to improve over time.
  • Programming API. Continuing up the software stack, Crowdsorcery’s Programming API will provide an environment for an advanced requester to quickly prototype a complex workflow combining existing and new primitive task types. Existing optimization routines could help with parameter optimization. Advanced users would be able to access the logic in these routines, and retarget/reimplent them for their specific use case.
  • GUI. Finally, the GUI layer will provide a wrapper on top of the programming API for the lay requesters, which will hide many technical details, but will expose interface for execution monitoring of running tasks.

While research and industry solutions have been proposed for each of the above layers, they have typically addressed each layer in isolation. No single platform or requester tool integrates all of them today. This means that it is virtually impossible for (1) novice requesters to ever take advantage of optimization libraries and workflows, (2) optimization libraries to be used in practical settings which would necessarily require worker interfaces, or (3) best practices to be integrated into primitives for workflows. CrowdSorcery’s end-to-end toolkit will enable these novel possibilities in an effective and user-friendly manner. Its open-source nature will allow distributed maintenance and incorporation of latest developments into the toolkit rapidly.

What’s the next step? In this blog post, our goal is simple: consider all aspects of crowdsourcing task design and create a framework that integrates them together. In retrospect, the software stack we came up with is pretty obvious. Our hope is that this can be the starting point for a more detailed document detailing specific research directions in each of these domains (as related to the entire stack), and ultimately, for a crowdsourcing compiler (see the great 2016 Theoretical Foundations for Social Computing Workshop Report) or IDE which takes the magic out of crowdsourcing.

Crowdsorcery Team
Aditya Parameswaran (UIUC)
David Lee (UC Santa Cruz)
Matt Lease (UT Austin)
Mausam
(IIT Delhi & U. Washington)

Crowdsorcery was one of the group projects pursued at the CMO-BIRS 2016 WORKSHOP ON MODELS AND ALGORITHMS FOR CROWDS AND NETWORKS.

 

Report: CMO-BIRS Workshop on Models and Algorithms for Crowds and Networks

The Banff International Research Station (BIRS) along with the Casa Matemática Oaxaca (CMO) generously sponsored a 4-day workshop on Models and Algorithms for Crowds and Networks, which was held in Oaxaca, Mexico from August 29 to September 1, 2016. It was a stimulating week of tutorials, conversations, and research meetings in a tranquil environment, free of one’s routine daily responsibilities. Our goal was to help find common ground and research directions across the multiple subfields of computer science that all use crowds, including making a connection between crowds and networks.


More than a year ago, Elisa Celis, Panos Ipeirotis, Dan Weld and myself, Yiling Chen, proposed the workshop to BIRS. It was accepted in September 2015. Lydia Chilton later joined us and provided incredible insights and leadership on running Research-a-Thon at the workshop. Twenty eight researchers from North America, India and Europe attended the workshop. We mingled, exchanged ideas and perspectives, and worked together closely during the week.

The workshop featured nine excellent high-level, tutorial-style talks spanning many areas of computer science topics related to models, crowds, and networks:

  • Auction theory for crowds,
  • Design, crowds and markets,
  • Random walk and network properties,
  • Real-time crowdsourcing,
  • Decision making at scale: a practical perspective,
  • The collaboration and communication networks within the crowd,
  • Mining large-scale networks,
  • Crowd-powered data management, and
  • Bandits in crowdsourcing.

Several of these videos are now available, so take a look if you get a chance.

Outside of talks, much of our time was spent in small groups participating in Research-a-Thon similar to CrowdCamp, a crowdsourcing hack-a-thon run at several HCI conferences. People teamed up and worked on their chosen projects over a period of two and a half days. It was my first experience of a Research-a-Thon and I was totally sold. It worked as follows:

  • Each participant gave a brief pitch of two project ideas.
  • The group did an open brainstorming session and “speed dating” for exchanging project ideas.
  • Six teams were formed and set off to explore their respective problems.
  • At the end of the Research-a-Thon, teams came back and shared their progress.

The groups were able to make productive use of their 3 days: one formalized a social sampling model and proved initial results on the group-level behavior, another had a prototype where a user can ask the crowd to search information in their emails while preserving the privacy of the content, and another had already launched their MTurk experiment. (I wish I could be this productive all the time!) In the next few blog posts, several of the teams will each share their findings of the Research-a-Thon with readers of this blog.

Following a CCC workshop on Mathematical Foundations for Social Computing, we also had a visioning and long-term future discussion at the workshop. Participants collectively identified the following five directions or problems that are believed to be important for the healthy growth of the field:

  • Identifying a quantifiable grand challenge problem. Identifying and running a grand challenge can be one of the best ways to push the frontier of research on crowds and networks.
  • Comparisons, benchmarks and reproducibility. It’s been difficult to make comparisons of research results and hence difficult to know whether progresses have been made. This has led to the desire of having benchmarks for research comparisons as well as formal good-practice guidelines and ideas on how to increase reproducibility of research in this field.
  • Theory, guarantees and formal models. Participants recognized the benefits and challenges of developing formal models and theoretical guarantees for systems that involve humans. Some fields, such as economics, have had enormous success despite such challenges — one suggestion is to identify reachable goals towards formalizing models and theoretical approaches.
  • Human interpretable components. Many visions about joint human-machine systems require that humans are interchangeable blocks and map computational models to humans. More progresses can potentially be made if we change our perspective and try to make components of systems more interpretable to humans.
  • The future of the crowd. The excellent The Future of Crowd Work paper published in 2013 continues to represent the concerns about and promises of crowd work.

The next blog post is from the Crowdsorcery team, discussing their Research-a-Thon project. Stay tuned!