Shanghai, China
June 24–26, 2019
Click here for more information and registration

Simultaneous translation will be provided for all keynote and breakout sessions.

To view the Chinese version of this schedule please go here.

Venue + Sponsor Showcase Map
场馆 + 赞助商展示区地图
Back To Schedule
Tuesday, June 25 • 17:30 - 18:05
Managing Large-Scale Kubernetes Clusters Effectively and Reliably - Yong Zhang & Zhixian Lin, Ant Financial

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
As the business grows, we need to deploy Kubernetets into several data centers all around the world. There are more than ten thousands of Nodes in a single data center. The critical challenge we are facing is how to manage several large-scale Kubernetes clusters across data centers with efficiency and reliability.

In this talk, we will share the experince and practices of automating large-scale cluster management. At first, we will introduce fully automated Node lifecycle management, and how to automatically discover and recover Node failures based on NPD, Autoscalers and customized Operator. Then we will share the experience and solutions of Kubernetes cluster deployment and upgrading. Finally, we will share the risk prevention and control system based on Prometheus and Operator, which is the cornerstone of reliability with the ability of automatic faults detection and isolation.


Yong Zhang

Senior Software Engineer, Ant Financial
A Senior Software Engineer of Ant Financial.

Zhixian Lin

Senior Software Engineer, Ant Financial
A Senior Software Engineer of Ant Financial.

Tuesday June 25, 2019 17:30 - 18:05 CST