Spare5’s Tips for Sourcing Better Training Data

Mere minutes after our awesome advisor, Dan Weld, mentioned The 4th AAAI Conference on Human Computation and Crowdsourcing (HCOMP), we were all-in. It’s rare to scroll through an event program and realize that each and every session is going to be so relevant and useful to your work, but that’s exactly how we were all feeling with this event’s agenda. And it did not disappoint!

We returned to Seattle from Austin newly excited and energized to enable folks to earn spare change in their spare time in a fun, engaging way, while providing practitioners with custom, quality, accurate machine learning and AI training data.

Our decision to sponsor HCOMP required very little human computation, and we were thrilled to give a keynote talk on our tips for sourcing better training data. We’ve created an online version of our presentation deck for your reference; hope it’s helpful.

As a brief review, we recommend:

  • great UI & UX for annotators
  • interactive workflow design on mobile & web
  • known, trained, qualified annotators
  • real-time QA & annotator management
  • algorithmic task distribution & quality scoring

Details in the deck.

If you’d like to learn more about these ideas or have something to add, please give us a shout. We’re also particularly interested in the topic of bias in training data, so if this is a concern of yours as well, get in touch and let’s study it together (we’ll bring the data!).

Finally, as we noted in our talk, we’re hiring! We’re growing our data science team and looking for computer vision experts specifically. Check out our openings if you’re looking for your next great opportunity.

A big thanks to everyone at HCOMP. We had a great time and look forward to continuing the many discussions we started there.

Until next year!

— Spare5

About the author

The Spare5 Team

Cassie Sanchez is Content Marketing Manager for Spare5.

View all posts