Anonymous Crowd vs. Managed Team: A Study on Quality Data Processing at Scale

Abstract: Whether you’re training machine learning algorithms or using traditional analysis techniques, the quality of your data determines performance. Data-science tech developer Hivemind enlisted CloudFactory’s managed workforce and a leading crowdsourced workforce to complete a series of tasks, ranging from basic to more complicated, to determine which team delivered the highest-quality structured datasets. In this ODSC session, we’ll dig into the study results, share lessons learned, and offer key insights that will help you strategically deploy people and technology to enhance the quality of your datasets while you free up your highest-value team members to focus on innovation.
Join this session to hear from data and workforce experts and learn:
1) The difference in accuracy you might expect from using an anonymous crowdsourced team versus a managed team of data workers
2) The interesting behavioral impact of paying workers by the hour - rather than by the task - and how it can affect the quality
3) Factors that can help you strategically deploy an anonymous crowdsourced team or a managed team

Bio: Mark Roulston is Co-Founder and Senior Data Scientist at Hivemind, a software company that helps companies build, clean, and enrich datasets from messy or unstructured information. Mark holds a bachelor's degree from the University of Cambridge in natural sciences and physics. He completed his Ph.D. at Caltech in planetary science.