Do you trust your crowd?
Crowd-powered systems are being used for more and more complex tasks these days, but not much is known about the potential risks originating from workers themselves.
Types of Threats
Threats come in all shapes and sizes. A single worker can effectively extract information from a single task, but might have a hard time targeting vulnerable tasks from systems that only periodically includes sensitive information. Individual workers also are usually ineffective at disrupting the final output of systems that combine input from multiple workers. However, groups are able to attack these systems as well as more successfully extract even sparse pieces of sensitive information or reconstruct content that was divided to help improve privacy.
Raising an Army
But can attackers realistically be expected to gather large groups of people to attack these systems? Could they use the crowd itself to boost their numbers? Would crowd workers help a requester do something like steal a user’s credit card for an extra $0.05?
To find out, we ran two sets of experiments using workers from Mechanical Turk. For both, we pretended to be two different requesters: one [potentially] malicious requester (who posted an “Attack” task), and one requester being attacked (who posted a “Target” task). Workers started at the Attack task, were shown a set of instructions, and then asked to continue on to the Target task.
One way the crowd can attack a system is by collecting private information from a task. This is especially of concern as systems that leverage the crowd for tasks, such as supporting assistive technology that captions a user’s conversation , or answers visual questions [1,3], make it possible to access personal information (e.g., a credit card number accidentally captured in an image). To simulate this, our task asked Target task workers to copy all text out of an image they were shown (Fig. 1).
As a baseline, the Attack task asked workers to complete the Target task without returning any information. We then showed workers an image of a credit card drawing which clearly contained no real personal information (the “Innocent” condition), and contrasted the response rates we saw with the case where the Target task contained an image of a real-looking credit card (the “Malicious” condition). Despite containing the same amount of information, we saw a significant drop in response rate when the task looked more potentially harmful (Fig. 2).
Another way workers can attack a system is to manipulate the answer that is provided to the user. We again recruited workers to attack a system, but this time, the Attack task provided workers with an answer to provide the target task (Figure 3). Our Target task asked workers to transcribe hand-written text they saw in an image.
As a baseline, we asked workers to complete the Target task with no special instructions. We then ask workers to provide a specific plausible answer given the image (the “Innocent” case), and compared the answers we received with those we got when the workers were asked to give a clearly wrong answer. We again saw a significant drop in the number of workers who were willing to complete the Attack task as instructed (Fig. 4).
Now the question is, how to we avoid these attacks? Future approaches can leverage the fact that hired groups of workers appear to contain some people who are cognizant of when tasks contain potentially harmful information in order to protect against other workers who don’t notice the risk or will complete the tasks regardless – an alarming ~30% of workers.
References J.P. Bigham, C. Jayant, H. Ji, G. Little, A. Miller, R.C. Miller, R. Miller, A. Tatrowicz, B. White, S. White, T. Yeh. VizWiz: Nearly Real-time Answers to Visual Questions. UIST 2010.
 W.S. Lasecki, C.D. Miller, A. Sadilek, A. Abumoussa, D. Borrello, R. Kushalnagar, J.P. Bigham. Real-time Captioning by Groups of Non-Experts. UIST 2014.
 W.S. Lasecki, P. Thiha, Y. Zhong, E. Brady, J.P. Bigham. Answering Visual Questions with Conversational Crowd Assistants. ASSETS 2013.
Full paper: Information Extraction and Manipulation Threats in Crowd-Powered Systems.
Walter S. Lasecki, University of Rochester / Carnegie Mellon University
Jaime Teevan, Microsoft Research
Ece Kamar, Microsoft Research