By Eric Holloway and Robert J. Marks II
Since the 1970s, we’ve known humans can find approximate solutions to NP-Complete (really hard) problems more efficiently than the best algorithms (Krolak 1971). The best algorithms scale quadratically, while humans scale linearly (Dry 2006). We also know many of the most widely used and successful machine learning algorithms are NP-Hard to train optimally (Diettrich 2000). This suggests a human/machine hybrid can produce better models than machine learning alone.
However, there are problems with human interaction. The hardest problem is visualization. It is hard visualizing data with more than 3 dimensions. So, we perform dimension reduction by projecting data onto two dimensions. We then collect many weak models (humans draw boxes) from multiple projections to build a strong model; known as boosting in machine learning.
Combining crowdsourcing and boosting, we use Amazon’s Mechanical Turk to collect the models. The data is transformed by the models into a feature space. Then, we use linear regression to classify new data.
We test the human/machine hybrid on one artificial dataset and four real world datasets, all with ten or more dimensions. This hybrid is competitive with machine only linear regression on the untransformed data.
You can read more about our work in the HCOMP 2016 paper High Dimensional Human Guided Machine Learning.
A demo is available for a limited time.
Krolak, P., Felts, W., & Marble, G. (1971). A man-machine approach toward solving the traveling salesman problem. Communications of the ACM, 14(5), 327-334.
Dry, M., Lee, M. D., Vickers, D., & Hughes, P. (2006). Human performance on visually presented traveling salesperson problems with varying numbers of nodes. The Journal of Problem Solving, 1(1), 4.
Dietterich, T. G. (2000, June). Ensemble methods in machine learning. In International workshop on multiple classifier systems (pp. 1-15). Springer Berlin Heidelberg.