How Competing In Data Mining competitions Makes You A Great Data Scientist!

Abstract: Over the recent years, data mining competitions have gained a good amount of interest and attention. This might be partly due to seemingly lucrative prize money - 1 Million for the Netflix Prize and 3 Million for the Heritage Health Prize along with a good amount of stardom. In addition, companies increasingly use Kaggle as a hunting ground for talent extending job interviews to the highest ranking submissions. Despite the apparent upside – performing well on these competitions against seasoned ‘competition professionals’ is incredibly difficult and can be exceedingly frustrating for a junior graduate trying to land his or her first data science job. However, even if you fail to perform up to your own expectations – participating in competitions is an incredibly valuable learning experience as a rite of passage towards becoming a good or even great data scientist. While it is now called a ‘science’ – the truth of handling data is that it much more resembles a craft than a science. It is experience rather than understanding and memorizing ‘laws of nature’ that will shape you into a good data scientist. It is not the fancy algorithm you know, but the oddities in the data you can spot. The more data sets of greater variety you have touched and perhaps failed to model, the better a data scientist you will be!

Bio: Claudia Perlich started her career in Data Science at the IBM T.J. Watson Research Center, concentrating on research in data analytics and machine learning for complex real-world domains and applications. She tends to be domain agnostic having worked on almost anything from Twitter, DNA, server logs, CRM data, web usage, breast cancer, movie ratings and many more. More recently she acted as the Chief Scientist at Dstillery where she designed, developed, analyzed, and optimized machine learning that drives digital advertising to prospective customers of brands. Claudia continues to be an active public speaker and has published over 50 scientific publications as well as a few patents in the area of machine learning. She has won many data mining competitions and awards at Knowledge Discovery and Data Mining (KDD) conferences, and served as the organization’s General Chair in 2014. Claudia is the past winner of the Advertising Research Foundation’s (ARF) Grand Innovation Award and has been selected for Crain’s New York’s 40 Under 40 list, Wired Magazine’s Smart List, and Fast Company’s 100 Most Creative People. She received her PhD in Information Systems from the NYU Stern School of Business where she still teaches as an adjunct professor.