Loading…
Shanghai, China
June 24–26, 2019
Click here for more information and registration

Simultaneous translation will be provided for all keynote and breakout sessions.
我们将为所有主题演讲和分组会议提供同声传译服务。

To view the Chinese version of this schedule please go here.
请点击此处查看中文版本。

Venue + Sponsor Showcase Map
场馆 + 赞助商展示区地图
Tuesday, June 25 • 14:20 - 14:55
1-5-10: How to Fast Recover Container Failure at Large Scale - XiongHuan, Alibaba

Sign up or log in to save this to your schedule and see who's attending!

Feedback form is now closed.
In cloud era, container based applications in enterprise grow rapidly, then container failure's possibility is amplified so much due to mannual operations, hardware failure and so on. Thus how to guarantee reliability of containers at scale without increasing resource investment is a really huge challenge cloud platform face.

Alibaba has run millions of containers and put forward 1-5-10 thoery for recovering container-related failure: MTTD(Mean Time to Detect) is 1 min, MTTI(mean time to identity) is 5 min, MTTR(mean time to resolve) is 10 min.

In this session we'll discuss how to increase reliability of large-scaled containers by 1-5-10:
1. How to build an efficient agent locally to detect problems within 1 min;
2. How to diagnose container problem intelligently by expert's knowledge base;
3. How to recover container problem automatically in one failure-driven way.

Speakers
HX

Huan Xiong

Senior Engineer, Alibaba
A senior software engineer in Alibaba, focuses on reliability of host/container/cluster.



Tuesday June 25, 2019 14:20 - 14:55
515