Putting the “Data” In Data Scientist

Abstract: One of the byproducts of our digitally transformed world is the accumulation of large quantities of data. As a data scientist, the challenge is how to effectively manage, clone, prepare, and extract value from huge datasets for deep learning training.
In this workshop, we will examine how to handle large amounts of data at scale with the help of deep learning ops, or DeepOps as we call it. DeepOps is a set of methodologies, tools, and culture where data engineers and scientists collaborate to build a faster and more reliable deep learning pipeline.
Together, we’ll look at a publicly available dataset, ChestXray14, for reference and learn how to:
Organize one of the largest publicly available chest x-ray datasets.
Correctly correlate and tag medical images to corresponding metadata.
Discuss strategies for storing this data.
Prepare and stream data for deep learning training.
View and version data with MissingLink.ai’s query tool.
Afterward, you’ll walk away with the knowledge of how to automate data management, exploration, and versioning in your deep learning projects. Attendees will get access to the training material presented during the talk to continue experimenting with ChestXray14’s data on their own.

Bio: As MissingLink’s Chief Evangelist, Jesse Freeman focuses on teaching DeepOps techniques that speed up AI first companies using computer vision and deep learning. One of the ways Jesse does this is by approaching deep learning from an engineering standpoint. With over 20+ years of enterprise development experience at companies like Amazon, Microsoft, MLB, HBO, New York Jets, Bloomberg and more, Jesse is an expert in his field. In addition to his development background, Jesse has a masters in interactive computer art from the School of Visual Arts. He can be found on twitter at @jessefreeman.