If one were to figure out how to resolve the Turing test for social media content moderation, a lot of money could be made. Thus far, however, algorithmic solutions through machine learning and artificial intelligence are not sufficient to automatically filter all content that users post on the internet. As a result, social media companies have resorted to hiring human reviewers or outsourcing these tasks to online labor markets such as crowdsourcing platforms or third-party companies.
While the repercussions of hate speech, graphic images, or other content that might not align with community standards are of concern for platforms to not upset their users and provide benign environments for advertising, there has recently been an increased interest in the (unseen) labor that content moderators do in order to uphold these content standards.
Recent conferences and workshops at UCLA, Santa Clara University, the University of Southern California, and the Alexander von Humboldt Institute for Internet and Society at Berlin have brought together researchers, industry practitioners, and actual moderators to discuss the ethical and professional ramifications of a job that is in formation. Additionally, an increasing number of dissertations, news articles, books, as well as documentary movies have begun to focus on this issue.
In a court case filed with the King County Superior Court in Seattle, Washington, the plaintiffs Soto and Blauert are suing a major corporation for alleged development of post-traumatic stress disorder (PTSD) because of exposure to materials such as child pornography that they have encountered as part of their work as content moderators. The current status quo of research often invokes precarious labor conditions of content moderators but rarely succeeds at providing empirical underpinnings. Ethnographic research is one attempt to get at the core of the labor conditions of moderators, while experimental research such as our study is another approach.
Workers in commercial content moderation settings include hired employees of social media companies, crowd workers from online labor markets such as Amazon Mechanical Turk and Figure Eight (formerly known as CrowdFlower), and contractors from specialized third-party companies. In our study, we investigate how labor conditions of these workers can be alleviated while maintaining the precision of human judgement.
The question that guides our research in this paper is: How can we reveal the minimum amount of information to a human reviewer such that an objectionable image can still be correctly identified?
We do this by developing a special moderation interface for crowdsourced image moderation on Amazon Mechanical Turk. We collect a set of images both “safe for work” and “not safe for work” (i.e., graphic content) via Google Images and task crowd workers with moderating these images at varying degrees of blurriness. In doing so, low-level pixel details are eliminated but images remain sufficiently recognizable to accurately moderate. In the case that an image is too heavily obfuscated, we additionally provide tools for works to partially reveal blurred regions to better help them complete their task while still protecting them from the majority of the image contents. Beyond merely reducing exposure, putting finer-grained tools in the hands of the workers provides them with a higher-degree of control in limiting their exposure and allows them to determine how much they see, when they see it, and for how long. In conducting this study, we aim to (1) gauge to what degree of such obfuscation can moderators sufficiently discern content, and (2) identify whether obfuscation can alleviate emotional well-being in content moderation processes.
We have carefully considered this issue and recognize that our use of graphic images is a controversial decision. However, we cannot meaningfully assess effects of obfuscation in this particular domain without using pictures that may be considered unsafe for work. Unlike in actual content moderation settings, where moderators continuously sift through content, participants in this study are asked to moderate 10 images in a controlled setting. In most cases, these will be obfuscated, but in some cases people will still be able to see images depicting pornography or violence. To provide additional safeguards to protect participants in case of emotional trauma or disturbance, we provide a list of national mental health resources at the end of our experiment. Furthermore, the consent form carries descriptive and explicit warnings regarding the disturbing nature of some of the images and discourages users who might not want to be exposed to such content from partaking in the study. The design of this study was heavily influenced by valuable feedback from the Institutional Review Board and the Office of the Vice President for Legal Affairs at the University of Texas at Austin, for which we are grateful.
The criteria for human judgment that we provide to content moderators are based on leaked moderation rules used by Facebook on the crowdsourcing platform oDesk (now known as Upwork). First, participants will be asked to moderate a set of images, which we will obfuscate, at different degrees of blur (certain participants will be exposed to one particular level of blur and others to more or less levels of blur). Subsequently, participants will be asked to respond to a survey via Qualtrics about demographics, positive and negative experience and feelings, positive and negative affect, emotional exhaustion, as well as perceived ease of use and usefulness of the moderation interfaces.
As this project is a Work-In-Progress, we cannot report data at this point but have recently obtained final approval from the Institutional Review Board at the University of Texas at Austin, contingent on which we are now able to proceed with the experiment. With this paper, we hope to establish a case for more empirical work that puts the labor conditions of moderators first, and we invite other researchers to also explore methods for improving working conditions of content moderators. However, we believe that this is not a task that should be left to just scientists at research institutions. First and foremost, this is a responsibility that social media platforms need be held to, and we hope that our research also supplements future arguments to hold social media companies more accountable for the menial forms of labor they create.
For more details, please read our Work-In-Progress paper/extended abstract, But Who Protects the Moderators? The Case of Crowdsourced Image Moderation, which will be presented in several forms at the 6th AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2018) and the 6th ACM Collective Intelligence Conference (CI 2018).
Brandon Dang, University of Texas at Austin
Martin J. Riedl, University of Texas at Austin & Alexander von Humboldt Institute for Internet and Society
Matthew Lease, University of Texas at Austin
The first two authors contributed equally.