Threats to the validity of results obtained through laboratory human subjects experiments are well-known: tasks may be artificial compared to their real-world counterparts, or subjects may behave differently based on perceptions of the researcher’s expectations, for example. Yet many scientific experiments being conducted using crowdsourcing labor pools present additional challenges unique to these environments, even in comparison to other online experiments.
As an example, my own studies of graph perception and comprehension using MTurk have indicated that what appear to be quick, heuristic-based thinking styles can predominate in workers’ performance on HITs, despite the use of best practices like qualification tasks and questions with verifiable answers. Recently, I observed how identical graph inference HITs that differed based on whether they were launched as single HITs or in a multiple HIT sequence received significantly different levels of worker attention, as measured by time-to-completion and error rates. A series of very similar HITs decreases the cognitive load associated with task switching, which may be why such tasks are in high demand on MTurk. The sequenced HITs had much higher errors, and were completed more quickly.
What does it mean for crowdsourced experimentation if an increased cognitive load is more desirable, for the worker’s understanding of the task and subsequently the quality of responses? Dual reasoning accounts of cognition suggest that people can process incoming information intuitively and automatically, feeling little effort, or can apply more deliberative, systematic and analytical thinking to make a decision. In many cases, a central challenge to crowdsourced experimentation is how to activate the sort of systematic thinking that will lead to better responses, and possibly more skilled workers in the long run, while still creating HITs that workers want to do! (See Bjoern Hartmann’s post for more on the latter point).
In my position paper, I propose that using techniques to design MTurk HITs that specifically induce active, systematic cognitive processing of presented task information such as images or text, as opposed to passive response states, can improve the quality and validity of experimental results.
One such technique involves integrating more difficult-to-parse stimuli in key portions of a task. Harder-to-read fonts, for example, have been shown to increase recall and comprehension of textual information , as has using more cognitively “costly” legends as opposed to labels on graphs .
Motivation, or self-directed psychological activity, can both add to and balance the use of “desirable difficulties” techniques like the above. Active, engaged processing of information can be induced by increasing a reasoner’s motivation to attend to the content. What if researchers devoted more attention to creating an aesthetically-pleasing or personalized HIT? Doing so could strike a balance between increasing a worker’s motivation and sense of enjoyment of the task while simultaneously introducing cognitive difficulties that decrease the likelihood of more erroneous automatic reasoning.
I for one look forward to the further exchange of ideas between psychological studies of cognition and crowdsourcing experimentation!
1. Alter, A.L., Oppenheimer, D.M., Epley, N., and Eyre, R.N. Overcoming intuition: Metacognitive difficulty activates analytic reasoning. Journal of Experimental Psychology-General 136, 4 (2007), 569–576.
2. Shah, P., Miyake, A., and Freedman, E. Are Labels Really Better Than Legends? The Effects of Display Characteristics and Topic Familiarity on the Comprehension of Multivariate Line Graphs. (2011, in preparation).
Jessica Hullman is a second year phd student at the University of Michigan School of Information. Her research focuses on the design and interpretation of information visualizations, in particular in online social visualization environments where non-experts engage in collaborative visual analytics.