Leading the Crowd

by Kurt Luther (Georgia Tech)

Who tells the crowd what to do? In the mid-2000s, when online collaboration was just beginning to attract mainstream attention, common explanations included phrases like “self-organization” and “the invisible hand.” These ideas, as Steven Weber has noted, served mainly as placeholders for more detailed, nuanced theories that had yet to be developed [6]. Fortunately, the last half-decade has filled many of these gaps with a wealth of empirical research looking at how online collaboration really works.

One of the most compelling findings from this literature is the central importance of leadership. Rather than self-organizing, or being guided by an invisible hand, the most successful crowds are led by competent, communicative, charismatic individuals [2,4,5]. For example, Linus Torvalds started Linux, and Jimmy Wales co-founded Wikipedia. The similar histories of these projects suggest a more general lesson about the close coupling between success and leadership. With both Wikipedia and Linux, the collaboration began when the project founder brought some compelling ideas to a community and asked for help. As the project gained popularity, its success attracted new members. Fans wanted to get involved. Thousands of people sought to contribute–but how could they coordinate their efforts?

(from “The Wisdom of the Chaperones” by Chris Wilson, Slate, Feb. 22, 2008)

Part of the answer, as with traditional organizations, includes new leadership roles. For a while, the project founder may lead alone, acting as a “benevolent dictator.” But eventually, most dictators crowdsource leadership, too. They step back, decentralizing their power into an increasingly stratified hierarchy of authority. As Wikipedia has grown to be the world’s largest encyclopedia, Wales has delegated most day-to-day responsibilities to hundreds of administrators, bureaucrats, stewards, and other sub-leaders [1]. As Linux exploded in popularity, Torvalds appointed lieutenants and maintainers to assist him [6]. When authority isn’t decentralized among the crowd, however, leaders can become overburdened. Amy Bruckman and I have studied hundreds of crowdsourced movie productions and found that because leaders lack technological support to be anything other than benevolent dictators, they struggle mightily, and most fail to complete their movies [2,3].

This last point is a potent reminder: all leadership is hard, but leading online collaborations brings special challenges. As technologists and researchers, we can help alleviate these challenges. At Georgia Tech, we are building Pipeline, a movie crowdsourcing platform meant to ease the burden on leaders, but also help us understand which leadership styles work best. Of course, Pipeline is just the tip of the iceberg–many experiments, studies, and software designs can help us understand this new type of creative collaboration. We’re all excited about the wisdom of crowds, but let us not forget the leaders of crowds.

Kurt Luther is a fifth-year Ph.D. candidate in social computing at the Georgia Institute of Technology. His dissertation research explores the role of leadership in online creative collaboration.

References

  1. Andrea Forte, Vanesa Larco, and Amy Bruckman, “Decentralization in Wikipedia Governance,” Journal of Management Information Systems 26, no. 1 (Summer): 49-72.
  2. Kurt Luther, Kelly Caine, Kevin Ziegler, and Amy Bruckman, “Why It Works (When It Works): Success Factors in Online Creative Collaboration,” in Proceedings of GROUP 2010 (New York, NY, USA: ACM, 2010), 1–10.
  3. Kurt Luther and Amy Bruckman, “Leadership in Online Creative Collaboration,” in Proceedings of CSCW 2008 (San Diego, CA, USA: ACM, 2008), 343-352.
  4. Siobhán O’Mahony and Fabrizio Ferraro, “The Emergence of Governance in an Open Source Community,” Academy of Management Journal 50, no. 5 (October 2007): 1079-1106.
  5. Joseph M. Reagle, “Do As I Do: Authorial Leadership in Wikipedia,” in Proceedings of WikiSym 2007 (Montreal, Quebec, Canada: ACM, 2007), 143-156.
  6. Steven Weber, The Success of Open Source (Harvard University Press, 2004).

Workshop Paper
Fast, Accurate, and Brilliant: Realizing the Potential of Crowdsourcing and Human Computation

Capitalizing on Mobile Moments

When mobile, the time period that people have to engage in an activity is generally short — on the order of minutes and sometimes as short as a few seconds. Unlike the non-mobile situation such as at the office or at home, these time periods that we characterized as mobile moments are fleeting.  Tasks performed at such times need to be facilitated by a mobile interface that permits users to get to the core of their activity as quickly and easily as possible with minimal overhead.

Mobile moments are also potential opportunities to harness human resources for computation especially when people have free time on their hands.  The smartphone, being always available and on, enables people to use such free times on activities that are pleasant and entertaining.  If the activities, as a side effect, are beneficial to others, mobile moments can be leveraged for the greater good.  Thus, empowered by their smartphones, crowdsourcing efforts can tap such users in their mobile moments to perform human computation tasks. These tasks could be location-based but need not — they should simply be performed in those serendipitous moments.

Our work on FishMarket, a mobile-based prediction market game, was born out of an interest in crowdsourcing amongst enterprise workers during their mobile moments.  The game enables these workers to use their mobile devices, anytime and anyplace, to share specialized knowledge quickly and efficiently.  The game’s user experience evolved through several iterations as we attempted to make the game concepts accessible and engaging, and game play easy and quick, to encourage people to play the game during their brief mobile moments.

The space and the types of possible human computation tasks for mobile moments are largely unmapped;  we are interested in exploring these possibilities.  Also, we are particularly interested in the design aspects (e.g., UI, game, social) as well as attributes of the crowdsourcing tools. Examples of attributes include how the tools channel experts’ desire to solve problems, how the tools tap into people’s willingness to share, and how the tools use the crowd to sort through the solutions to find the best one.

Alison Lee and Richard Hankins are Principal Research Scientists at Nokia Research Center in Palo Alto.  Alison is developing mobile services that enhance mobile work, mobile collaboration, and mobile recreation.  Richard’s research focus is on future mobile devices and systems. They both hold a Ph.D. in Computer Science — Alison from the University of Toronto and Richard from the University of Michigan.

Just hiring people to do stuff

As many of you know, my recent interest has been “just hiring people to do stuff”.

Let me make a case for why I think this is research, and why it is important.

Mankind has never before had such easy, affordable, and fast access to expert labor at such a small scale.

I’m not talking about Mechanical Turk. I’m talking about real expertise: people who know how to program, people who know how to draw, people who know how to write. These people can be found on sites like oDesk, Freelancer and Elance. They can be hired within a day, sometimes within an hour, for bite-sized projects as small as $5. Few people do this, however. Few people know they can, but the day is coming.

We are on the cusp of a new way of working.

Consider the effect web search had on information. As I write this blog post, I make Google queries to gain and verify information. I think about information differently because of web search — I need less of it in my head.

Consider the effect outsourcing may have on expertise. As I write this blog post, why am I not dictating in crude Greg-isms to an expert word-smith that I hired just now to craft these sentences? We will think about expertise differently because of outsourcing — we will need to acquire less of it ourselves.

We needed to learn how to use web search as part of our everyday workflow. We didn’t know how at first. Not everyone knows how even now. My mom has difficulty forming effective search queries. But it is a crucial skill to acquire.

We need to learn how to outsource as part of our everyday workflow. Practically nobody knows how. Most outsourcing is large scale — an entire website, or an entire program. It is like searching Google for a book on Java programming, and then reading the book, rather than searching for specific information needs when they arise.

The game is changing. This isn’t just bridging the gap in AI until we get there, this is the industrial revolution of knowledge work. It will change the economic, cultural and political landscape of mankind. It is worth researching.

Greg Little is an n-year PhD student at MIT. He is finishing his thesis as we speak, on human computation algorithms.

Would you be a worker in your crowdsourcing system?

As a Computer Scientist I am interested in two primary research questions about crowdsourcing:

  1. How might new systems broaden the range and increase the utility of crowdsourced work?
  2. What models, tools, and languages can help designers and developers create new applications that rely on crowdsourcing at their core?

I am investigating these questions together with my students at the Berkeley Institute of Design, in our Crowdsourcing Course, and through external collaborations (e.g., Soylent). At CHI, we will present works-in-progress on letting workers recursively divide and conquer complex tasks and on integrating feedback loops into work processes.

As a humanist, I believe it incumbent upon us to also think about the values our systems embody. I have a recurring uneasiness with the brave new world conjured by some of our projects for two reasons. The first one has been articulated before: many crowdsourcing research projects (including my own) rely at their core on a supply of cheap labor on microtask markets. Techniques we introduce to insure quality and responsiveness (e.g., redundancy, busy-waiting) are fundamentally inefficient ways of organizing labor that are only feasible because we exploit orders of magnitude in global income differences [1].

My second reservation is that the language used to describe how our systems decompose, monitor, and regulate the efforts of online workers recalls that of Taylor’s Scientific management. By observing, measuring and codifying skilled work, Taylorism moved knowledge from people into processes. This increased efficiency and made mass manufacturing possible; but it also led to the creation of entire classes of repetitive, undesirable, deskilled jobs.

I believe Stu Card had it right when he wrote that “We should be careful to design a world we actually want to live in.” As a step in this direction we might want to consider whether we ourselves would participate as workers in our own crowdsourcing systems. An exercise in my class, where students had to earn at least $1 as workers on Mechanical Turk suggests that the answer is today is a resounding “No.”

This leads me to ask a third research question – one I am less prepared to answer but where finding an answer is important if we believe that crowdsourcing will actually grow into a significant role in our future economy:

  1. How might we increase the utility, satisfaction and beneficience of crowdsourcing for workers?

I am looking forward to discuss these questions with you at the workshop.

1: Thanks to Volker Wulf for this thought.

Shepherding the Crowd: An Approach to More Creative Crowd Work

By Steven Dow and Scott Klemmer (Stanford HCI Group)

Why should we approach crowdsourcing differently than any collaborative computing system? Sure, crowdsourcing platforms make on-demand access to people easier than ever before. And this access provides new opportunities for distributed systems and social experiments.  However, workers are not simply “artificial artificial intelligence,” but real people with different skills, motivations, and aspirations. At what point did we stop treating people like human beings?

Our work focuses on people. Can we help workers improve their abilities? Can we keep them motivated? Can workers effectively carry out more creative and complex projects? Our experiments show that simple changes in work processes can significantly affect the quality of results. Our goal is to understand the cognitive, social, and motivational factors that govern creative work.

Along with our Berkeley colleagues Björn Hartmann and Anand Kulkarni, we introduce the Shepherd system to manage and provide feedback to workers on content-creation tasks. We propose two key features to help modern micro-task platforms accomplish more complex and creative work. First, formal feedback can improve worker motivation and task performance. Second, real-time visualizations of completed tasks can provide requesters a means to monitor and shepherd workers. We hypothesize that providing infrastructural support for timely and task-specific feedback and worker interaction will lead to better educated, more motivated workers, and better work results. Our next experiment will compare externally provided feedback with self assessment. Does the added cost of assessing work outweigh simpler mechanisms such as asking workers to evaluate their own work?

What’s the potential for creative crowd work?  Check out The Johnny Cash Project and Star Wars Uncut.

Steven Dow examines design thinking, prototyping practices, and crowdsourcing as a Stanford postdoc and Scott Klemmer advocates for high-speed rail in America and co-directs the Stanford HCI group.

Leveraging Online Virtual Agents to Crowdsource Human-Robot Interaction

Human-Robot Interaction (HRI) studies the social aspects of robotic behaviors. Results in HRI have emphasized not only the need for us to utilize human-computer interaction design principles but also principles of psychology. Non-verbal social cues such as gaze, attention, prosody, transparency with goal oriented behavior, and intentions are just a few aspects of behavior that become important with actuated agents. Along with these HRI principles, the community must also focus on traditional topics in robotics and machine learning : dialog management, navigation, manipulation, and learning by demonstration to name a few.

The A.I. community has had a lot of positive results with knowledge based agency. Whether this is in the form of policy learning, symbolic A.I. or so called classic A.I. or straightforward sense-think-act or sense-act architectures, these results have been very promising. Much of this has focused on knowledge acquisition from a large corpus : both from crowds or from standard benchmark data sources like the WSJ corpus. The analog in the robotics community has been working on learning agent behaviors tabula-rasa (from scratch) from direct user interaction. These can sometimes emphasize that we must raise the robots as if they were “babies” (motor babbling, learning by demonstration, kinematic learning, etc).

Our paper argues that the HRI community can also benefit from a data-driven approach to HRI in which the agent mimics non-verbal observed behaviors as well as learning from observed dialog, observed tasks online, and learning from labeled objects it can perceive. We have set up a preliminary study collecting more than 50,000 interactions in our online game, Mars Escape, and used them to train our real-world robot to mimic the role that the human took in the game, that of the robot. While our game doesn’t cover the entire realm that our dream may source from, we are just now establishing what a virtual agent using data from the internet or in general from a virtual world would be trained to use. I hope to present these results and have a discussion with a real group of experts who have had success in harnessing the crowd to give us more appropriate data and to help us gather more participants.

Video for this preliminary work can be found here. [Warning: file is large]

Making Databases more Human

by Adam Marcus (MIT CSAIL), Ph.D. Student

As Eugene Wu and I wrote in our crowd research workshop submission, it’s time to involve the (computer science) systems community in supporting human computation. We’re certainly not the only ones thinking about the topic, but I’d like to talk to you about two systems we’re building at MIT: Qurk for declarative specification of human computation workflows, and Djurk for standardizing human computation platform development.

Qurk lets you write queries in a declarative language (like SQL) that merges crowd- and silicon-powered operations. A simple query in Qurk to select images of males from a table of pictures would be “SELECT image_url FROM images WHERE gender(image_url) == ‘Male’;” In this case, gender is a user-defined function which would ask the crowd to identify the gender of the person in the image.

Human computation and databases research have traditionally been separate. Why cross the streams?

  • Databases eat, speak, and breathe adaptive optimization. The parameters (money, accuracy, latency) and models are different, but databases can integrate these new models into traditional workflows.
  • Common operators, such as filters, joins, and sorts, give us common optimization goals. Databases speak a limited number of common and useful operations. Once we cast popular human computation tasks into this common language, the community can iteratively improve operator implementations.
  • Best practices can be encoded into a package of user-defined functions. Want to batch or verify HITs? Someone will likely have written a package you can use for it in Qurk.

The challenges in integrating databases and human computation are fourfold. First, we need to identify the signals (e.g., worker agreement rate) through which Qurk should adapt query execution. Second, we must learn how common building blocks (e.g., item comparisons or ratings) of larger algorithms (e.g., joins or sorts) are best implemented with the crowd. Third, we have to identify how new challenges (e.g., extremely high operation latency) change how we implement traditional query execution engines. Finally, we should identify the ideal crowd workflow specification language.  Will we build workflows through traditional langauges like SQL, visual workflow builders, or something completely different?

We also offer a call-to-arms to the open source platform-building community. It pains us to see so many human computation platforms being built from scratch, each with its own set of quirks and limitations. Developer time should not be wasted re-implementing common human computation platform kernels. Like Hadoop does for distributed computation and WordPress does for publishing, we would like to see a pluggable, white label platform for human computation. This platform, which we call Djurk, would let developers innovate on questions that matter, such as incentives and interfaces, rather than building yet another job submission framework.

We’re excited to meet everyone at the workshop!

Adam Marcus and Eugene Wu are Ph.D. students at MIT collectively advised by Sam Madden, Rob Miller, and David Karger.

What is a Question? Crowdsourcing Tweet Classification

Micro-task markets have been commonly used to crowdsource tasks such as the categorization of text or the labeling of images. However, there remain various challenges to crowdsourcing such tasks, especially if the task is not easy to define or if it calls for specific skills or expertise. We crowdsourced a task which seemed simple on the surface but turned out to be challenging to design and execute.

We were interested in exploring the types and topics of questions people were asking their friends on Twitter. The first step in this exploration was to identify tweets that were questions. Since manually classifying random tweets was time consuming and hard to scale, we crowdsourced tweet categorization to Mechanical Turk workers. Specifically, we asked workers to identify whether a given tweet was a question or not. This was non-trivial as questions on Twitter were frequently ill-formed, short (maximum length of 140 characters) and adhered to a unique language full of short-hand notations.

Defining this task for workers seemed more difficult than we had expected. While most humans know intuitively what a question is, it was hard for us to phrase the task instructions so as not to impose our own definition of ‘question’ on workers. Further, the unique characteristics of tweets made it hard for workers to select question tweets: some questions were rhetorical, some contained little context, and some were too short to understand.

Another challenge we faced was with respect to ensuring high quality data. We designed several controls to ensure quality. First, we required workers to have a valid Twitter handle in order to do our task. This ensured their familiarity with the language and norms of Twitter. Next, to eliminate spam responses, we included some ‘control tweets’ in the list of tweets presented to Turkers. Control tweets were obviously ‘question’ or ‘not question’. Workers who did the task sincerely would be able to rate these control tweets correctly. We only accepted data from workers who rated all control tweets correctly.

We found that this method of selecting questions from Twitter was not very scalable. Only 29% of workers who completed the task provided valid (non-spam) data. Further,  only 32% of the tweets rated by them were found to be questions. Thus, a large number of workers would have to be recruited for our task to obtain a decent sample of question tweets to study. Moreover, recruiting workers who were Twitter users was hard. However, the controls ensured that we received high quality data and emphasized the need to include verifiable questions in tasks.

We analyzed the questions selected by Turkers by type and topic and found that a large percentage (42%) of these questions were rhetorical. We also found that the most popular topics for questions were entertainment, personal and health, and technology.

Our experience shows that there are still challenges to address in crowdsourcing simple human intelligence tasks such as classification of text. We look forward to sharing our methodology with the workshop participants and discussing ideas for dealing with such challenges.

User, Crowd, AI: The Future of Design

by Michael Bernstein (MIT CSAIL), workshop organizer

For years, the task of user interface design has boiled down to one critical question: agency. How much of the interaction relies on user input, and how much on algorithms and computation? Do we ask the user to sort through a pile of documents himself, or do we rely on an imperfect search engine? Does the user enter her own location in Foursquare, or does the phone instead try to triangulate her location using cell towers and GPS?

This question of agency leads to a design axis. At the ends we have completely user-controlled systems and completely AI-driven systems. Lots of designs sit somewhere in between.

But, the future of interaction may well be social, and it may well be crowd-driven. Crowds have begun creeping in to our user interfaces: Google Autosuggest uses others’ queries to accelerate yours, crisis maps like Ushahidi rely on crowdsourced contributions, and new systems like Soylent and VizWiz push the crowd directly into the end-user’s hands.

The User-AI axis no longer works. We need to introduce a third element: Crowd. Adding Crowd to the design space gives us a three-axis picture: one I call The Design Simplex, but which you are welcome to call a triangle:

There are a huge number of unexplored areas in this design space. What happens when we use crowds to quickly vet AIs that aren’t yet good enough for primetime on their own? (Pushing the points farther toward ‘AI’ while maintaining a healthy heaping of ‘Crowd’.) We could deliver technologies to the user years ahead of their general availability, and all the time be using the crowd work to train better AIs. What kinds of Crowd-User hybrids can we build that are more complex or powerful than AutoSuggest?

I’m excited to explore this space with you. We are just scratching the surface of what’s possible.

Michael Bernstein is a PhD student in computer science at the Massachusetts Institute of Technology. His dissertation research is on Crowd-powered Interfaces: user interfaces that directly embed crowd contributions to grant powerful new abilities to the user. He is a co-organizer of the CHI crowdsourcing workshop.

Using Humans to Gain Insight from Data

by Lydia Chilton (University of Washington), workshop organizer

My first degree and research experiences were in empirical microeconomics. One strong impression I took away from my experience working with large numeric data sets like the US Census is that there is a limit to the degree of insight numbers alone can provide. In particular, numbers can often tell you what is true, but not why it is true.

A famous labor economics study by David Card and Alan Krueger is a case in point. The Card and Krueger study uses a natural experiment on the New Jersey/Pennsylvania border to test the well-established theory that increasing minimum wage causes an increase in unemployment. The authors collected their own data and ran meticulous statistical tests, but astoundingly found the exact opposite of what the theory predicted. They found that the mandated increase in minimum wages actually decreased unemployment. To my knowledge, this result has never been explained. The numbers can’t tell us “why?”

I am frustrated by this lack of explanation. The numbers can’t tell us why, but behind the numbers are people – people who made decisions that led to this unexpected result. I constantly wonder: can’t we just ask them why?

The problem with asking “why” is the complexity, diversity, and nuance of the answers we get.  I believe that in order to answer “why” questions well, we need to develop new ways for humans to process the responses. Currently, we rely on numeric data sets because computers can process numbers quickly, and because statistical methods tell us how to draw conclusions from them. But today, with crowdsourcing platforms, we have the potential to use people to process human-generated data in order to gain more insight. For example,

  • We could add questions about individual job market decisions to the US Census. The ability to switch jobs is important to a healthy economy. We could ask people who would like to switch jobs, but haven’t, why they haven’t, and gain real insight about inefficiencies in the job market.
  • We could use existing free-text data such as Facebook status updates to probe questions like “Why are Hudson University students from lower income backgrounds more likely to fail freshman classes?” by detecting trends in inferred mental state and other life conditions revealed by the students.

In order to answer “why” questions effectively, we need 1) human computation algorithms that can use humans in parallel to analyze data and draw conclusions, and 2) a method for expressing our confidence in the results – an analog to the powerful statistical tests that express our confidence in numerical results.

Human experience and behavior is rich and varied, and number crunching alone can’t understand it.  But human computation can, and we should explore that opportunity.

Lydia Chilton is a 2nd-year PhD student in computer science at the University of Washington and currently an intern at Microsoft Research in Beijing, China. She is an organizer of the CHI Crowdsourcing workshop and a co-author of TurKit and two labor economics studies of Mechanical Turk workers.