Machine Learning for Dummies

By Eric Holloway and Robert J. Marks II

Dummies Commentary
Humans are slow and sloppy, why do we want human guided machine learning?

Since the 1970s, we’ve known humans can find approximate solutions to NP-Complete (really hard) problems more efficiently than the best algorithms (Krolak 1971). The best algorithms scale quadratically, while humans scale linearly (Dry 2006). We also know many of the most widely used and successful machine learning algorithms are NP-Hard to train optimally (Diettrich 2000). This suggests a human/machine hybrid can produce better models than machine learning alone.

Solution times
Polynomial regression of human solution times against problem sizes (Dry 2006).

However, there are problems with human interaction. The hardest problem is visualization. It is hard visualizing data with more than 3 dimensions. So, we perform dimension reduction by projecting data onto two dimensions. We then collect many weak models (humans draw boxes) from multiple projections to build a strong model; known as boosting in machine learning.

User Interface
User interface for Amazon Mechanical Turk HIT.

Combining crowdsourcing and boosting, we use Amazon’s Mechanical Turk to collect the models. The data is transformed by the models into a feature space. Then, we use linear regression to classify new data.

Linear Regression
Linear regression classification of human produced features.

We test the human/machine hybrid on one artificial dataset and four real world datasets, all with ten or more dimensions. This hybrid is competitive with machine only linear regression on the untransformed data.

Results
Results of linear regression classification using just machine, and using human/machine hybrid.

You can read more about our work in the HCOMP 2016 paper High Dimensional Human Guided Machine Learning.

A demo is available for a limited time.

References

Krolak, P., Felts, W., & Marble, G. (1971). A man-machine approach toward solving the traveling salesman problem. Communications of the ACM, 14(5), 327-334.

Dry, M., Lee, M. D., Vickers, D., & Hughes, P. (2006). Human performance on visually presented traveling salesperson problems with varying numbers of nodes. The Journal of Problem Solving, 1(1), 4.

Dietterich, T. G. (2000, June). Ensemble methods in machine learning. In International workshop on multiple classifier systems (pp. 1-15). Springer Berlin Heidelberg.

 

About the author

Eric Holloway

View all posts