Data Loading: The Next Frontier in Scale-out Deep Learning

Abstract: In this talk you will learn how to create efficient input pipelines that are tailored to your training data. As number of projects, number of GPUs, and data size increase, there is no one-size-fits-all input pipeline that can keep GPUs fed with data.

We will examine the relationship between training throughput and image representation. We'll provide guidance on tradeoffs between pre-processing datasets and in-line data processing, and we'll review results from a distributed training environment with multiple NVIDIA DGX-1s and a Pure Storage FlashBlade to highlight performance impact at scale. Learn how to maximize time to accuracy and, ultimately, time to shipping models.

Bio: Seth Jamison is a Principal Solutions Architect for FlashBlade at Pure Storage, where he is a senior technical advisor to engineering, product management, sales, marketing and presales leadership in the field.

Since 2013, Seth has held various individual and presales leadership roles within Pure and has been a trusted advisor to his customers since moving into the data storage industry in 2000. Prior to Pure Storage, Seth spent eight years with Dell Inc., most recently as a Field CTO / Enterprise Technologist in the Office of the CTO.