Loading…
Shanghai, China
June 24–26, 2019
Click here for more information and registration

Simultaneous translation will be provided for all keynote and breakout sessions.
我们将为所有主题演讲和分组会议提供同声传译服务。

To view the Chinese version of this schedule please go here.
请点击此处查看中文版本。

Venue + Sponsor Showcase Map
场馆 + 赞助商展示区地图
Back To Schedule
Tuesday, June 25 • 18:15 - 18:50
Multi-Cloud Machine Learning Data and Workflow with Kubernetes - Lei Xue, Momenta & Fei Xue, Google

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Autonomous vehicles require hardware accelerated machine learning for critical problems such as tracking and classification. Momenta trains ML models in on-prem regions and public clouds, each comes with different GPUs and network interfaces (Infiniband, RoCE).

In this talk we discuss how we use Kubernetes to build a multi-cloud ML platform - in particular how we manage training data across different environments; how we address multi-user and gang scheduling; and how we support heterogeneous hardware.

Speakers
FX

Fei Xue

Product Manager, Ant Financial
Fei Xue is currently a product manager at Ant Financial working on ML and data platform. Fei was an early member of the Kubeflow team at Google, an open source effort to help developers and enterprise develop and deploy cloud-native machine learning everywhere. Fei comes from a distributed... Read More →
avatar for Lei Xue

Lei Xue

Infrastructure Tech Lead, Momenta
Lei Xue currently works as an AI Infrastructure tech lead at Momenta. He leads a development team that focuses on GPU cluster management for Kubernetes&Docker. Previously, Lei was a member of KataContainers/Hyper team and the software engineer of Oracle/Sun Microsystems. He is also... Read More →



Tuesday June 25, 2019 18:15 - 18:50 CST
620