IS THE UBIQUITOUS FIVE STAR RATING SYSTEM IMPROVABLE?
We compare three popular techniques of rating content: five star rating, pairwise comparison, and magnitude estimation.
We collected 39 000 ratings on a popular crowdsourcing platform, allowing us to release a dataset that will be useful for many related studies on user rating techniques.
The dataset is available here.
We ask each worker to rate 10 popular paintings using 3 rating methods:
- Magnitude: Using any positive number (zero excluded).
- Star: Choosing between 1 to 5 stars.
- Pairwise: Pairwise comparisons between two images, with no ties allowed.
We run 6 different experiments (one for each combination of these three types) with 100 participants in each of them. We can thus analyze the bias given by the rating system order, and the results without order bias by using the aggregated data.
At the end of the rating activity in the task, we dynamically build the three painting rankings induced by the choices of the participant, and ask them which of the three rankings better reflects their preference (the ranking comparison is blind: There is no indication on how each ranking has been obtained, and their order is randomized).
WHAT’S THE PREFERRED TECHNIQUE?
Participants clearly prefer the ranking obtained from their pairwise comparisons. We notice a memory bias effect: The last technique used is more likely to get the most accurate description of the real user preference. Despite this, the pairwise comparison technique obtained the maximum number of preferences in all cases.
While the pairwise comparison technique clearly requires more time than the other techniques, it would be comparable in terms of time with the other techniques using a dynamic test system (of order NlogN).
WHAT DID WE LEARN?
- Star rating is confirmed to be the most familiar way for users to rate content.
- Magnitude is unintuitive with no added benefit.
- Pairwise comparison, while requiring a higher number of low-effort user ratings, best reflects intrinsic user preferences and seems to be a promising alternative.
For more, see our full paper, Pairwise, Magnitude, or Stars: What’s the Best Way for Crowds to Rate?