Making Data Science A Single Line of Code on Jupyter Notebook

Abstract: Accelerating data science to speed up time-to-market, and democratizing data science for participants of different skill levels are critical in scaling data science practice for large organizations. Meanwhile, enterprise data science is a highly-complex, manual, and skill-dependent process, that requires considerable amount of database/dataframe operations to build data ETL/feature engineering pipelines as well as programming efforts to implement machine learning algorithms. This presentation discusses and demonstrates our core innovation in data science automation using an A.I.-powered feature engineering and automated machine learning. Fully integrated with PySpark on Jupyter Notebook, the end-to-end data science process is now simplified into a single line of code on Jupyter Notebook, significantly accelerating and democratizing enterprise data science.

Bio: Aaron is currently the Vice President of Data Science and Solutions at dotData.

As a data science practitioner with 14 years of research and industrial experience, he has held various leadership positions in spearheading new product development in the fields of data science and business intelligence. At dotData, Aaron leads the data science team, responsible for product development and working directly with clients to solve their most challenging problems. Prior to joining dotData, he was a Data Science Principle Manager with Accenture Digital, responsible for architecting data science solutions and delivering business values for the tech industry on the West Coast. He was instrumental in the strategic expansion of Accenture Digital’s footprint in the data science market in North America.

Aaron received his Ph.D. degree in Applied Physics from Northwestern University.