It’s supposed to be cheap, fast, good AND a hobby for “Turkers”. A dream come true, especially for people developing high-cost language resources in our field of research, Natural Language Processing.
But what if the gold mine was more of a coal mine?
At a time when it’s getting more and more difficult to get funding to create language resources (lexicons, translations, annotations, etc.), some pressured researchers might want to believe the dream-come-true story and use MTurk without knowing exactly what’s under the table. Our goal is therefore to give some insights about the reality of the system, in particular about the people working on it, and the issues behind it.
MTurk is an on-line crowdsourcing, microworking system which enables elementary
tasks (Human Intelligence Tasks, or HITs) to be performed by a huge number of people (typically called “Turkers”) on-line.
Who are the Turkers and why do they use MTurk?
Although 500k people are registered as Turkers in the MTurk system, there are really between 15,059 and 42,912 of them. 80% of the HITs are performed by the 20% most active, who spend more than 15 hours per week in the MTurk system.
The observed mean hourly wages for performing jobs in the MTurk system is below US$ 2. However, money is an important motivation for a majority of the Turkers (20% use MTurk as their primary source of income, and 50% as their secondary source of income), and leisure is important for only a minority (30%).
That’s for the hobby lobby.
What about Ethics?
Besides the very low wages, the Turkers have no guarantee of payment for work properly performed, no governmental or private benefits whatsoever and they have no recourse to any channels for redress of employer wrongdoing (apart from trying to avoid them using a tool like Turkopticon or checking the TurkerNation hall of Shame).
Also, keep in mind that most tasks performed on MTurk used to be done by conventional employees, including from agencies like LDC or ELRA, with traditional contracts, real wages and protection against employer wrongdoing by federal labour laws.
Some researchers may not care, but we think (hope) the majority does.
The future of language resources development?
MTurk costs may soon become the standard costs, and it may become very difficult to obtain funding for a project proposing fair wages.
Therefore, our community’s use of MTurk not only supports a workplace model that is unfair and open to abuses of a variety of sorts, but also creates a de facto standard for the development of linguistic resources that may have long-term funding consequences.
What can we do?
In France, the learned society ATALA and the professional association APIL are working on a Charter for ethically produced resources, so that researchers and funding agencies will be able to easily identify such resources and promote them. We think that’s a good start.
“People do have choices, but some have more choices than others.”(Turkopticon blog)
We have the choice to use more ethical systems, so let’s do it:
let’s play GWAP (Games With A Purpose), let’s use ethical plate-formes, let’s volunteer, let’s Max Havelaar the creation of language resources!
For more, in particular on the quality that you can expect from this system, see our full paper, “Amazon Mechanical Turk (MTurk): Gold Mine or Coal Mine?“.