CrowdCamp Report: Waitsourcing, approaches to low-effort crowdsourcing

Crowdsourcing is often approached as a full-attention activity, but it can also be used for applications so small that people perform them almost effortlessly. What possibilities are afforded by pursuing low-effort crowdsourcing?

Low-effort crowdsourcing is possible through a mix of low-granularity tasks, unobtrusive input methods, and an appropriate setting. Exploring the possibilities of low-effort crowdsourcing, we designed and prototyped an eclectic mix of ideas.

Browser waiting
In our first prototype, we built a browser extension that allows you to complete tasks while waiting for a page to load.

Tab shown loading, while a browser popup shows a  "chose the outlier image" task
A Chrome extension that allows users to perform simple tasks (e.g., odd image selection) while a page is loading

Getting tasks loaded and completed during the time it takes for a page to load is certainly feasible. A benefit of doing so is that the user is already disrupted in their flow by the browser load.

Emotive voting
How passive can a crowdsourcing contribution be? Many sites implement low-effort ways to respond to the quality of a online content, such as a star, ‘like’, or a thumbs up. Our next prototype takes this form of quality judgment one step further: to no-effort feedback.

Using a camera and facial recognition, we observe a user’s face as they browse funny images.

Images being voted on with smiles and frowns
The emotive voting interface ‘likes’ an image if you smile while the image is on the screen, and ‘dislikes’ if you frown.

There are social and technical challenges to a system that uses facial recognition as an input. Some people do not express amusement outwardly, and privacy concerns would likely deter users.

Secret-agent feedback
Perhaps our oddest prototype lets a user complete low-effort tasks coded into other actions.

Our system listens to the affirmative grunts that a person gives when they are listening to somebody –or pretending to. Users are show A versus B tasks, where an “uh-huh” selects one option while a “yeah” selects another.

AwesomeR Interface
The awesomeR meme interface lets a user choose the better meme via an affirmative grunt (i.e. “yeah” or “uh huh”) while he/she is talking to someone else.

Imagine Bob on the phone, listening patiently to a customer service rep while also completing tasks. The idea is silly, but the method of spoken input quickly become natural and thoughtless.

Binary tweeting

Can a person write with a low-bandwidth input? We provide a choice-based composer where users are offered a multiple choice interface for their next word.

Sentence generation with choice-based typing. The program prompts a user to choose one of two words that are likely to come after the previous words, allowing them to generate a whole sentence by low-effort interaction.
Sentence generation with choice-based typing. The program prompts a user to choose one of two words that are likely to come after the previous words, allowing them to generate a whole sentence by low-effort interaction.

By plugging into Twitter for its corpus, the phrases our prototype constructs are realistically colloquial and current. There are endless sentiments that can be expressed on Twitter, but much of what we do say, about one-fifth, is nearly identical to past messages.

As we continue to pursue low-effort crowdsourcing, we are thinking about how experiments such as those outlined here can be used to capture productivity in fleeting moments. Let us know your ideas in the comments.

Find the binary tweeter online, and find our other prototypes at GitHub.

Jeff Bigham, Carnegie Mellon University, USA
Kotaro Hara, University of Maryland, College Park, USA
Peter Organisciak, University of Illinois, Urbana-Champaign, USA
Rajan Vaish, University of California, Santa Cruz, USA
Haoqi Zhang, Northwestern University, USA

CrowdCamp Report: Reconstructing Memories with the Crowd

From cave paintings to diaries to digital videos, people have always created memory aids that allow them to recall information and share it with others. How can the collective memories of many people be combined to improve collective recovery. For example, the layout of a community gathering place with sentimental or historical value could be recovered, or accidents and crimes may be explained using information that appeared trivial at first but actually has great importance.

Our CrowdCamp team set out to determine what some of the challenges and potential methods were for reconstructing places or things from the partial memories of many people.

Case Studies

We began by attempting to reconstruct common memories such as the layout of a Monopoly board. Figure 2 below shows our individual and collective attempts at this task. We found some facts that one group member recalled helped resurface related memories in other members. However, working together also introduced ‘groupthink’, where a false memory from one person corrupted the group’s final model. This is a known problem, and it is one reason why police prefer to interview witnesses separately.

Figure 1. Our reconstruction of a Monopoly board (left), compared to the true version (right).

The Effect of Meaningful Content on Memory

Next, we tried to see how information type changes the process. It’s well documented that people’s minds summarize information for better recollection. We tried 3 cases:

  • No meaning: Memorize a Sudoku puzzle (table of ordered numbers)
  • Some meaning: Memorize a set of about 30 random objects
  • Meaningful scene: Memorize a living room scene

For each, we first tried to memorize parts of the scene without coordination, then with predefined roles, e.g., different members were told to remember disjoint aspects or parts. In both cases we first wrote down what we remembered, then merged our results. Coordinated roles increased both recall and precision. Recall increased because the set of items we remembered individually was more distinct, meaning we did not redundantly memorize the same things. Precision increased because the more narrow task additional focused our attention by removing extra distractors.

Opportunities and Challenges

In some settings, prior domain knowledge allows people to organize for increased collective memory. One theme is that diversity aids in reconstruction. For example, one person may remember colors well while another may be color-blind but have a good spatial memory. Even outsiders who have no connection with the memory may be able to help.  For example, in Figure 2 below, a paid oDesk worker helps us remember our stressful first-day presentation at CrowdCamp by creating an illustration based on notes and images we provided.

An image depicting 4 presenters crying and one girl sitting at a desk in the background.
Figure 2. An oDesk worker’s rendition (left) of our stressful CrowdCamp presentation based on our notes and sketch we provided (above).

We identified three main challenges to reconstructing memories:

  • Groups, especially those containing members with strong personalities, are subject to groupthink, which can introduce errors.
  • Because some aspects of a scene are more salient, people’s memories often overlap significantly.
  • In unbounded settings, people’s accuracy decreases, likely due to an overwhelming amount of information

One consistent property was that we tended to remember nearly all of the information we could recall in total in the first few seconds or minutes, depending on the size of the task. After that, significant gains were only seen when one person’s idea jogged the memory of another.

Future Directions

We believe this work has great potential to introduce a more structured way to recreate memories using groups of people of all sizes, while avoiding problems encountered with naïve solutions. For example, approaches that mix individual recollection early on with later collaboration, while using parallel subsets of workers to minimize groupthink, could improve the way we recover knowledge in settings ranging from historical documentation to crime scenes.

What other ideas or references for recovering ideas can you think of? Anything we missed? We’d love to hear about it!

Authors
Adam Kalai, Microsoft Research

Walter S. Lasecki, University of Rochester / Carnegie Mellon University
Greg Little, digital monk
Kyle I. Murray, MIT CSAIL

CrowdCamp Report: HelloCrowd, The “Hello World!” of human computation

The first program a new computer programmer writes in any new programming language is the “Hello world!” program – a single line of code that prints “Hello world!” to the screen.

We ask, by analogy, what should be the first “program” a new user of crowdsourcing or human computation writes?  “HelloCrowd!” is our answer.

Hello World task
The simplest possible “human computation program”

Crowdsourcing and human computation are becoming ever more popular tools for answering questions, collecting data, and providing human judgment.  At the same time, there is a disconnect between interest and ability, where potential new users of these powerful tools don’t know how to get started.  Not everyone wants to take a graduate course in crowdsourcing just to get their first results. To fix this, we set out to build an interactive tutorial that could teach the fundamentals of crowdsourcing.

After creating an account, HelloCrowd tutorial users will get their feet wet by posting three simple tasks to the crowd platform of their choice. In addition to the “Hello, World” task above, we chose two common crowdsourcing tasks: image labeling and information retrieval from the web.  In the first task, workers provide a label for an image of a fruit, and in the second, workers must find the phone number for a restaurant. These tasks can be reused and posted to any crowd platform you like; we provide simple instructions for some common platforms.  The interactive tutorial will auto-generate the task urls for each tutorial user and for each platform.

Mmm, crowdsourcing is delicious
Mmm, crowdsourcing is delicious

More than just another tutorial on “how to post tasks to MTurk”, our goal with Hello Crowd is to teach fundamental concepts.  After posting tasks, new crowdsourcers will learn how to interpret their results (and get even better results next time).  For example: what concepts might the new crowdsourcer learn from the results for the “hello world” task or for the business phone number task?  Phone numbers are simple, right?  What about “867-5309” vs “555.867.5309” vs “+1 (555) 867 5309”?  Our goal is to get new users of these tools up to speed about  how to get good results: form validation (or not), redundancy, task instructions, etc.

In addition to teaching new crowdsourcers how to crowdsource, our tutorial system will be collecting a longitudinal, cross-platform dataset of crowd responses.  Each person who completes the tutorial will have “their” set of worker responses to the standard tasks, and these are all added together into a public dataset that will be available for future research on timing, speed, accuracy and cost.

We’re very proud of HelloCrowd, and hope you’ll consider giving our tutorial a try.

Christian M. Adriano, Donald Bren School, University of California, Irvine
Juho Kim, MIT CSAIL
Anand Kulkarni, MobileWorks
Andy Schriner, University of Cincinnati
Paul Zachary, Department of Political Science, University of California, San Diego

CrowdCamp Report: Crowdsourcing Challenging Problems

Traditionally the tasks accomplished on MTurk, known as microtasks, have been simple, repetitive, short, and knowledge lean. A recent thread of research has attempted to expand the types of tasks that can be accomplished on crowdsourcing platforms to include more open-ended and complex tasks. For example, CrowdForge [2] and Turkomatic [3] provide a general purpose framework to help the requester in planning and solving complex and interdependent problems. The power of these tools is usually showcased through tasks that can be performed by an skilled individual, such as writing a document or researching a purchase decision. But, can we think of ways of harnessing the collective-intelligence of the crowd to solve complex social problems that no individual can [4]? Our goal for the workshop project was to design a new workflow that enables crowd workers to solve challenging social problems.

Initial Workflow Design
We began with an initial framing of the process of solving a complex social problem as a search through the space of possible solutions to the target question. We adopted operators from genetic algorithms (mutation and crossover) to guide the search process. This initial process can be described as the following workflow:

  1. Question – Begin with a question to be posed to the crowd.
  2. Seed – Provide some number of seed solutions; these could be generated by the crowd, or added by the requester to help kickstart the process.
  3. Rank – Solutions are ranked by a portion of the crowd.
  4. Prune – The top k solutions are retained, and the rest are discarded. This prevents the search space from becoming too large.
  5. Generate – In this step, members of the crowd are asked to generate a new solution using one of the three following strategies:
    • Create a new solution that is not present in the current list of solutions.
    • Improve one of the existing solutions (this operation is similar to “mutation” in a genetic algorithm).
    • Recombine two or more solutions to create a more comprehensive solution (this operation is similar to “crossover” in a genetic algorithm).

    The requester controls the relative maximum payout for each of these options to guide the search process.

This workflow is similar to that described by Yu and Nickerson [5] in its use of recombination, but it further allows the requester to dynamically adjust between exploitation of known regions of the search space and exploration of regions not yet encountered.

Testing the initial workflow
Below are results from a pilot we ran at CrowdCamp, in which – for lack of time – we used a static search strategy (i.e. without shifting between exploration and exploitation of the search space). We tested the workflow using the following question [6] and seed answers.

Here is one of the top three solutions we got after ten rounds:

Evidently, the “solutions” incorporate a lot of the same content, aren’t very coherent, and don’t really do that much to answer the original question. Furthermore, fully 62% of answers generated by the turkers were invalid – empty, exact copies of other answers, nonsense, or obvious attempts to game the system by duplicating text within the answer itself. This problem was most salient under the “Recombine” condition.

Preliminary conclusions

  1. The problem, as stated, was too hard for turkers to make much progress with. Turkers mostly opted to recombine, despite the lower payout, because of the low probability of success and high investment cost of the other two options. A simple lexical recombination is virtually no work, and this becomes more attractive when the problem is sufficiently hard.
  2. Turkers were biased to prefer lengthy, redundant solutions over truly innovative ideas. This was seen both in the turkers ranking behavior, and in the frequency with which solutions were artificially expanded by repeating passages in a solution.
  3. Solution coherence suffered because of the preceding two problems.

Revising the initial workflow
Based on the above analysis, we decided to follow in the footsteps of Turkomatic and CrowdForge, and attempt a decomposition of the problem into simpler problems. To do this, we walked through the process of solving our target problem, using insights gained from our initial exploration.

The steps in the workflow are:

  1. Question and Solution Criteria– Begin with a question to be posted to the crowd and a set of solution criteria that can be used for evaluating the proposed solutions.  Specifying solution criteria is intended both to help the crowd understand what is being asked and to provide concrete ranking criteria.
  2. Elaborate – Using a version of the “find-fix-verify” design pattern [1], the crowd is asked to find the ambiguous and unclear parts of the question, propose fixes to clarify the question, and validate the fixes. The goal of this step is to help clarify questions that stem from poor wording or specialized vocabulary.
  3. Challenge – Turkers are asked to identify the difficulties they see in solving the question.
    1. Research – The research process works in parallel with the challenge step. The crowd will be asked to identify resources that might help others understand and resolve the identified challenges.
  4. Solve – Prompt crowd to generate length-restricted high-level solutions. This provides the crowd with a way to assess solution paths before pursuing them in detail. Restricting solutions to a certain length should mitigate the preference for lengthy answers.
  5. Integrate solutions – The high-level solutions proposed in the previous step are integrated by the crowd workers to form a complete solution.
  6. Filter – Asking crowd members to verify coherency of proposed solutions.
  7. Evaluate – Finally, the remaining solutions are evaluated against the criteria provided by the requester, and bad solutions will be culled.

Future
Our next step is to implement and evaluate our proposed workflow with real challenging problems and use existing idea competitions to validate the quality of crowdsourced solutions.

Josh Introne, MIT
Roshanak Zilouchian Moghaddam,  University of Illinois at Urbana-Champaign
HyunJoon Jung, UT Austin
Tao Dong, University of Michigan
Yiftach Nagar, MIT

* We thank Erik Duhaime (MIT) who was not at CrowdCamp, but contributed some of the ideas that led to this work.

References

  1. Bernstein, M.S., Little, G., Miller, R.C., Hartmann, B., Ackerman, M.S., Karger, D.R., Crowell, D. and Panovich, K., Soylent: a word processor with a crowd inside. in 23nd annual ACM symposium on User interface software and technology (UIST), (New York, NY, 2010), ACM, 313-322.
  2. Kittur, A., Smus, B., Khamkar, S. and Kraut, R.E., Crowdforge: Crowdsourcing complex work. in Proceedings of the 24th annual ACM symposium on User interface software and technology, (2011), ACM, 43-52.
  3. Kulkarni, A.P., Can, M. and Hartmann, B. Collaboratively crowdsourcing workflows with turkomatic Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, ACM, Seattle, Washington, USA, 2012, 1003-1012.
  4. Nagar, Y., Beyond the Human-Computation Metaphor. in The Third IEEE International Conference on Social Computing (SocialCom 2011), (Cambridge, MA, USA, 2011), IEEE, 800-805.
  5. Yu, L. and Nickerson, J.V., Cooks or cobblers?: crowd creativity through combination. in ACM annual conference on Human factors in computing systems (CHI 11′), (2011), ACM, 1393-1402.
  6. http://kidscoverage.challenge.gov/

CrowdCamp Report: Benchmarking the Crowd

As crowdsourcing evolves, the crowds are evolving too.  Mechanical Turk is a different population than it was a few years ago.  There are different crowds at different times of day.  Different crowds may be better or worse for one application or another — CrowdFlower, MobileWorks, or even a crowd of employees within a company or students within a school.

In particular, how can researchers and developers cooperate to collect aggregate data about system properties (e.g. latency, throughput, noise), demographics (gender, age, socioeconomic level), and human performance (motor, perceptual, attention) for the various crowds that they use?

census_cropped

We started exploring this question in a weekend CrowdCamp hackathon at CSCW 2013.  Some concrete steps and discoveries included:

  • We gathered 25 datasets from a wide variety of experiments on Mechanical Turk by many different researchers, ranging from 2008 to 2013.  We found 30,000 unique workers in our sample, and in the most recent datasets, between 20% and 40% workers who had also contributed to previous datasets.  So at least on MTurk, the crowd is stable enough for benchmarking between researchers to be a viable idea.
  • We prototyped a deployment platform, Census, that injects a small benchmarking task into any researcher’s existing HIT, using only one line of Javascript.  The image above shows an example Census task in action.
  • We trawled the recent research literature for possible benchmarking tasks, including affect detection, image tagging, and word sense disambiguation.

We also discovered that Mechanical Turk worker IDs are not as anonymous as researchers generally assume.  For benchmarking that shares information among researchers, it will be necessary to take additional steps to protect worker privacy while preserving the ability to connect the same workers across studies.

Saeideh Bakshi, Georgia Tech
Michael Bernstein, Stanford University
Jeff Bigham, University of Rochester
Jessica Hullman, University of Michigan
Juho Kim, MIT CSAIL
Walter Lasecki, University of Rochester
Matt Lease, University of Texas Austin
Rob Miller, MIT CSAIL
Tanushree Mitra, Georgia Tech

 

Mechanical Turk Workers Are Not Anonymous

Users of Amazon Mechanical Turk generally believe that the workers are anonymous, identified only by a long obscure identifier like A3IZSXSSGW80FN. (That’s mine.) A worker’s name or contact information can’t be discovered unless the worker chooses to provide it.

But it isn’t true. Many MTurk workers are rather easy to identify.

Take a typical worker ID. If you’ve ever used MTurk yourself, you can find your own worker ID on your Dashboard, on the far right:

Screen Shot 2013-02-28 at 10.06.03 PM

Just search the web for your worker ID, and you may find a surprising number of results:

  • wish lists
  • book reviews
  • tagged Amazon products

In fact, many workers even have a public Amazon profile page containing their real name and sometimes even a photo, at http://www.amazon.com/gp/pdp/profile/workerID. Here’s my profile:

Amazon public profile showing my real name

In preliminary testing with published datasets containing turker IDs, about 50% of worker IDs we tried had a public profile page, and about 30% of IDs had a discoverable real name.  A smaller percentage had a photo as well.

The fundamental problem here is that all of Amazon uses the same identifier for the worker’s account. Every interaction with Amazon is tagged by that identifier, and many of those interactions produce public pages containing the identifier, which are indexed by search engines.

We discovered this fact at a CrowdCamp workshop last weekend at the CSCW 2013 conference, and it came as a stunning surprise to a room full of researchers with years of experience using Mechanical Turk.

The implications are sweeping:

* For academic researchers: worker IDs may have to be treated as personally identifiable information. For example, publishing worker IDs online in public data sets may be a violation of worker privacy, and counter to the requirements of the researcher’s institutional review board.

* For workers: if you want to protect your online identity and retain anonymity on MTurk, you should register a different Amazon account expressly for MTurk, and not use the same account that you use for Amazon purchasing. But note that if you already have an MTurk account, creating a new one would lose any reputation you’ve built up.

* For Amazon itself: this is a privacy hole that needs addressing. The best solution would be to use distinct identifiers for MTurk and other Amazon properties (even if the same login account). At a minimum, however, workers and requesters should be made aware of this privacy risk, and workers whose accounts are publicly-identifiable should be permitted to create new ones without loss of reputation.

For more detail, see our working paper, Mechanical Turk Is Not Anonymous, which will be posted by March 7.

Saeideh Bakshi, Georgia Tech
Michael Bernstein, Stanford University
Jeff Bigham, University of Rochester
Jessica Hullman, University of Michigan
Juho Kim, MIT CSAIL
Walter Lasecki, University of Rochester
Matt Lease, University of Texas Austin
Rob Miller, MIT CSAIL
Tanushree Mitra, Georgia Tech

 

CrowdCamp Report: DesignOracle: Exploring a Creative Design Space with a Crowd of Non-Experts

For an unconstrained design problem like writing a fiction story or designing a poster, can a crowd of non-experts be shepherded to generate a diverse collection of high quality ideas? Can the crowd also generate the actual stories or posters? With the goal of building crowd-powered creativity support tools, a group of us, Paul André, Robin N. Brewer, Krzysztof Gajos, Yotam Gingold, Kurt Luther, Pao Siangliulue, and Kathleen Tuite, set out to answer these questions. We based our approach on a technique described by Keith Johnstone in his book Impro: Improvisation and the Theatre for extracting creativity from people who believe themselves to be uncreative. An uncreative person is told that a story is already prepared and he or she has merely to guess it via yes or no questions. Unbeknown to the guesser, there is no story; guesses are met with a yes or no response, essentially randomly. For example:

  1. Is it about CSCW? Yes
  2. Is it about CrowdCamp? No
  3. Is it about a bitter rivalry? Yes

As questioning proceeds, a consistent story is revealed, entirely due to the guesser generating and then externalizing an internally consistent mental model of a story (or poster, etc.) that justifies the given answers.

To evaluate the potential of this “StoryOracle” approach, we ran a series of experiments on Amazon Mechanical Turk:

  1. We extracted dozens of surprising and creative stories and poster designs using the technique.
  2. We explored the design space in a directed manner by generating variations on well-liked stories. Every question and yes or no answer provides a possible branch-point. To branch, we selected an important “pivot” question and showed all questions and answers up to the pivot, followed by the pivot question and the same or the opposite answer, to a set of new participants with instructions to continue guessing the story. For example: A story and branches
  3. We converted question-based stories into “normal” stories. For example:

    Jack is a scientist who has dedicated his life to discovering how to generate energy using fusion power because he believes it would benefit humanity. Eventually he discovers how to do it and tells his wife, Jane, the good news. Jane is not aware that this is a secret and talk about it with Carl (to whom she was in love before meeting Jack). Carl tries to steal the secret from Jack but Jane fought with him and is able to impede him.
    However, Jack discovers that Jane told Carl about the secret and they have a fight because of it. Eventually, Carl, who is angry with both of them, decides to kill Jack and Jane. He manages to kill the couple but when he is about to steal Jack’s secret, the power fusion discovery is released to the public through the internet.
    Thus, all the world is able to produce energy using Jack’s discovery and eventually his dream of providing a better quality of life to everyone comes true.

  4. We evaluated stories’ quality.
  5. We devised domain-specific prompts for questioners, such as the setting, theme, and characters in a story.

Taken together, in the two days of CrowdCamp we managed to build the foundation for a crowd-powered tool to explore a creative design space, in an undirected or directed manner, and generate a variety of high quality artifacts.

Paul André, Carnegie Mellon University
Robin N. Brewer, University of Maryland, Baltimore County
Krzysztof Gajos, Harvard University
Yotam Gingold, George Mason University
Kurt Luther, Carnegie Mellon University
Pao Siangliulue, Harvard University
Kathleen Tuite, University of Washington