Scaling AI Applications with Ray

Abstract: The next generation of AI applications will continuously interact with the environment and learn from these interactions. To develop these applications, data scientists and engineers will need to seamlessly scale their work from running interactively to production clusters. In this talk we introduce Ray, a high-performance distributed execution engine, and its libraries for AI workloads. We cover each Ray library in turn, and also show how the Ray API allows these traditionally separate workflows to be composed and run together as one distributed application.

Ray is an open source project being developed at the RISE Lab in UC Berkeley for scalable hyperparameter optimization, distributed deep learning, and reinforcement learning. We focus on the following libraries in this tutorial:

TUNE: Tune is a scalable hyperparameter optimization framework for reinforcement learning and deep learning. Go from running one experiment on a single machine to running on a large cluster with efficient search algorithms without changing your code. Unlike existing hyperparameter search frameworks, Tune targets long-running, compute-intensive training jobs that may take many hours or days to complete, and includes many resource-efficient algorithms designed for this setting.

RLLIB: RLlib is an open-source library for reinforcement learning that offers both a collection of reference algorithms and scalable primitives for composing new ones. In this tutorial we discuss using RLlib to tackle both classic benchmark and applied problems, RLlib's primitives for scalable RL, and how RL workflows can be integrated with data processing and hyperparameter optimization.

Bio: Eric Liang is a PhD student at UC Berkeley working with Ion Stoica on distributed systems and applications of reinforcement learning. He is currently leading the RLlib project (rllib.io). Before grad school, he spent 4 years working in industry on storage infrastructure at Google and Apache Spark at Databricks.