Last summer, I was reading John Markoff’s book What the Dormouse Said at a bed-and-breakfast in Maine. I came across a passage about Seymour Cray, and it occurred to me that we live in a uniquely interesting time for supercomputing.
Today’s supercomputers are quite different from Cray’s refrigerator-sized multiprocessors. They generally consist of thousands of networked commodity machines, a la Google’s data centers. A consequence of this is that more of the design of these supercomputers is being pushed to software. In the extreme, grid computing projects like Folding@Home involve no hardware design at all.
This is a common pattern in computing history. When general-purpose hardware becomes inexpensive and readily available, more can be done in software. (For example, now that we have iPhones, we buy fewer voice recorders and alarm clocks.) And since more people can write software than build hardware, any shift from hardware to software coincides with a burst of innovation.
In these times of shift, the most influential software innovations are the operating systems — MS-DOS in 1982, and the iPhone SDK and Facebook Connect today. For the cloud, the operating systems are the distributed file systems, RPC layers, and programming models such as MapReduce, Dryad, Pig, Sawzall, and BigTable. It’s this software that makes other software possible.
The force behind the rise of human computing is similar to the one behind the rise of cloud computing. Two billion people now have access to the Internet — and many of them are bored. So while computers made of people have existed since the mid 1700’s, it is only in the last decade that the “hardware” — people — have become so readily available.
Consequently, we can expect to see a burst of software innovation for crowd computing. And we have. We’ve seen a range of creative applications, quality-control methods, monitoring systems, cost-optimization algorithms. And as in cloud computing, the most influential software for human supercomputers will be the operating systems.
Humanism and Human Computation
A danger in designing human supercomputers is the tendency to take the metaphor literally. As computer scientists, we’re used to thinking about things like cost reduction, speed, availability, and fault tolerance. But when designing human supercomputers, focusing only on performance can lead to systems where workers are seen as individual, anonymous, and interchangeable, without needs other than those that can be directly measured as costs to the system.
Of course, most people don’t see themselves this way, and this conflict probably affects the quality of work that they do within such systems. But it also affects the quality of applications that can be built for these human supercomputers. A model for people that doesn’t include rich personal identities precludes the ability to use those identities for routing tasks. A model for people that doesn’t include social relationships precludes the ability to define control flows that make use of those relationships for teamwork. Most importantly, a design that treats people like machines will cause them to act like machines. This would be a sad waste of creativity, individual talent, empathy, and spontaneity.
A Hybrid Human Supercomputer
This all suggests something concrete to build: a supercomputer that consists of people and machines, where the machines are treated like machines and the people are treated like people. The supercomputer would be designed primarily in software, whose core would consist of a virtual machine for low-level resource allocation and a high-level programming language that can easily specify complex workflows.
The design would focus on the human needs of the workers who comprise the system. Not just their need for money (and at the beginning, we will not address that need at all), but also their need for independent thought, for self-determination, for learning, for community, for co-creation, for being a part of something larger than themselves.
As a natural byproduct, we hope that more powerful programming models for social computing will emerge, bottom-up models that harness a more inclusive range of things that people can do better than machines. Jabberwocky, as it stands, is the very beginning of our attempt to do this.
The Jabberwocky software stack consists of three layers. The first layer, Dormouse, is our virtual machine. It consists primarily of functions for resource allocation and routing of tasks to people and machines. Importantly, it allows developers and workers to create their own communities, and add profile properties and social structure as they see fit. Another important component of Dormouse is a template library, that allows developers to reuse task templates created and optimized by others and approved by workers.
The second layer of the Jabberwocky stack is ManReduce, a programming framework inspired by MapReduce (but closer to Dryad) that interfaces with Dormouse. ManReduce provides convenient programming abstractions for both human and machine computation. It automatically parses input files, routes intermediate data, produces output files, and transparently handles parallelism and serialization. After we built ManReduce, we found out that Nikki, Boris, Shusheel and Bob at CMU had similar lines of thinking in their excellent and independent CrowdForge work, which will also be presented at UIST. While the spirit of the two frameworks are the same, there are some key differences that we discuss in the paper.
And the highest level of the Jabberwocky stack is Dog, a high-level scripting language built on ManReduce and inspired by Pig and Sawzall. A key consideration in the design of Dog was that it should be human-readable even by non-programmers. Jabberwocky will be an open-source environment, like a web browser, where workers can see the source code of the Dog programs in which they will take a part. The hope is that being able to read the source code will both give workers a sense of context for the tasks that they are doing, and help them to make informed decisions about whether they want to participate. Another key consideration is that Dog should be simple enough so that the workers that comprise system can write Dog scripts. In the case of human computation, we prefer to have computers that can program themselves.
Jabberwocky is joint work with Salman Ahmad, Alexis Battle, and Zahan Malkani. Please check out the full paper here. We plan to make the framework available starting in early 2012. If you are interested in using or contributing to Jabberwocky, please drop us a note.