Loading…
Shanghai, China
June 24–26, 2019
Click here for more information and registration

Simultaneous translation will be provided for all keynote and breakout sessions.
我们将为所有主题演讲和分组会议提供同声传译服务。

To view the Chinese version of this schedule please go here.
请点击此处查看中文版本。

Venue + Sponsor Showcase Map
场馆 + 赞助商展示区地图
Tuesday, June 25 • 17:30 - 18:05
Managing Large-Scale Kubernetes Clusters Effectively and Reliably - Yong Zhang & Zhixian Lin, Ant Financial

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
As the business grows, we need to deploy Kubernetets into several data centers all around the world. There are more than ten thousands of Nodes in a single data center. The critical challenge we are facing is how to manage several large-scale Kubernetes clusters across data centers with efficiency and reliability.

In this talk, we will share the experince and practices of automating large-scale cluster management. At first, we will introduce fully automated Node lifecycle management, and how to automatically discover and recover Node failures based on NPD, Autoscalers and customized Operator. Then we will share the experience and solutions of Kubernetes cluster deployment and upgrading. Finally, we will share the risk prevention and control system based on Prometheus and Operator, which is the cornerstone of reliability with the ability of automatic faults detection and isolation.

Speakers
YZ

Yong Zhang

Senior Software Engineer, Ant Financial
A Senior Software Engineer of Ant Financial.
ZL

Zhixian Lin

Senior Software Engineer, Ant Financial
A Senior Software Engineer of Ant Financial.



Tuesday June 25, 2019 17:30 - 18:05 CST
619