Shanghai, China
June 24–26, 2019
Click here for more information and registration

Simultaneous translation will be provided for all keynote and breakout sessions.

To view the Chinese version of this schedule please go here.

Venue + Sponsor Showcase Map
场馆 + 赞助商展示区地图

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

KC+CNC - Operations [clear filter]
Tuesday, June 25


Dynamic Pod Resource Boundary Adjustment in Web Scale Clusters - Cheng Wang & Xiaoyu Zhang, Alibaba
Have you ever confused about how to set perfect resource limit for Pod? How do you balance resource efficiency with application's SLO?

In this talk, we will share practices and lessons learned from adjusting Pod resource limits dynamically for Web-scale clusters at Alibaba Group by co-locating Pods with different QoS classes on the same node and adjusting Pod resource limits dynamically (especially during resource contention).

After applying this effort in production clusters, we were able to improve the cluster resource usage by 14%~30%, tail latency (95 percentile) by 76%~87%, and TPS (transactions per second) by 107%~163%, respectively.

The audiences would benefit from experience of improving the resource utilization and application performance for their own clusters, with Kubernetes native approaches.

avatar for Xiaoyu Zhang

Xiaoyu Zhang

Software engineer, Alibaba
Xiaoyu Zhang is a software engineer in Alibaba group. He's a member of the Kubernetes organization. He mainly works on Kubernetes project and focuses on Docs, kubectl, controller-manager, storage and runtime areas. He had a speech in Cloud Native End User Conference 2018. This is... Read More →
avatar for Cheng Wang

Cheng Wang

Software engineer, Alibaba
Cheng Wang is a software engineer in Alibaba Group, helping enhance the cluster management and resource scheduling with data-driven intelligence for Alibaba’s Web-scale clusters. Prior to joining Alibaba, he worked at VMware with the focus on Docker, Kubernetes and edge computing... Read More →

Tuesday June 25, 2019 16:00 - 16:35


Kubernetes Housekeeping - Damini Satya Kammakomati & Mitesh Jain, Salesforce
One of the big challenges of running large scale distributed systems like Kubernetes is managing resources. The efficiency and long term operational readiness of such systems depends on how well the resource utilization is monitored and managed. Kubernetes provides a plethora of options and mechanisms to track and handle resources. However, like any other system, the best way to tune it is to know these options, mechanisms and more importantly understand them.

This session will explain various mechanisms available in Kubernetes to manage the resources. We will deep dive into concepts like Garbage Collection Controller, Kube Controller Manager, Eviction, and Kubelet Garbage Collection, providing details of how they work, how to configure them and what are the recommended settings.

avatar for Damini Satya Kammakomati

Damini Satya Kammakomati

Software Engineer, Salesforce
Damini Satya is a Software Engineer at Salesforce building tools for infrastructure automation internally. Not only she is an active open source contributor and part of various open source communities but also a teach speaker at a lot of well-known conferences like ReactConf, Grace... Read More →
avatar for Mitesh Jain

Mitesh Jain

Lead Systems Engineer, Salesforce
Mitesh Jain is Lead Systems Engineer at Salesforce building trusted platforms for distributed applications at Cloud scale. He has over 13 years of experience building and managing Open Source deployments in public and private clouds at enterprises like Red Hat, GE, Wipro Technologie... Read More →

Tuesday June 25, 2019 16:45 - 17:20


Managing Large-Scale Kubernetes Clusters Effectively and Reliably - Yong Zhang & Zhixian Lin, Ant Financial
As the business grows, we need to deploy Kubernetets into several data centers all around the world. There are more than ten thousands of Nodes in a single data center. The critical challenge we are facing is how to manage several large-scale Kubernetes clusters across data centers with efficiency and reliability.

In this talk, we will share the experince and practices of automating large-scale cluster management. At first, we will introduce fully automated Node lifecycle management, and how to automatically discover and recover Node failures based on NPD, Autoscalers and customized Operator. Then we will share the experience and solutions of Kubernetes cluster deployment and upgrading. Finally, we will share the risk prevention and control system based on Prometheus and Operator, which is the cornerstone of reliability with the ability of automatic faults detection and isolation.


Yong Zhang

Senior Software Engineer, Ant Financial
A Senior Software Engineer of Ant Financial.

Zhixian Lin

Senior Software Engineer, Ant Financial
A Senior Software Engineer of Ant Financial.

Tuesday June 25, 2019 17:30 - 18:05


Managing Kubernetes in Air Gap/Offline Environments - Rong Zhang, Suning.com
Most of the available software and tools to manage kubernetes clusters assume an internet connection. In practice, this requirement is not always possible and let end users alone to get started with Kubernetes.

The session will share different strategies to easily install, upgrade and manage Kubernetes in an offline environment.

Rong Zhang will talk about his experience in the bare metal environment and how they are using Kubespray and Harbor to manage their offline infrastructure.

avatar for Rong Zhang

Rong Zhang

Software Engineer, Suning.com
Rong is a software engineer at Suning developing platform services on top of Kubernetes, providing containerized infrastructure. Rong has been involved in the kubernetes community for three years and he is one of the maintainers of the Kubespray project.

Tuesday June 25, 2019 18:15 - 18:50
Wednesday, June 26


Plan to Fail: A Good Captain Doesn’t Sail Without Life Rafts - Steven Wong & Carlisia Campos, VMware
Historically, formal disaster recovery (DR) plans were only feasible for large enterprises. They could afford to allocate time, resources and the cost of duplicating a datacenter infrastructure.

With the popularity of public cloud and cloud native technologies, the cost and complexity of DR planning has been significantly reduced. This means every company, large and small, can engage in business continuity planning. Why is this important? These are some of the reasons:

Machines and software fail
- People make mistakes
- Hackers prey on the vulnerable
- Weather, fire, terrorism, more...
- You lose customers when there are outages and data loss
- Legal standards often require data retention

This talk will focus on:
- Items that need to be backup and why - some might surprise you
- Why you need selective restore capability
- Existing tooling to simplify and automate a DR strategy

avatar for Carlisia Campos

Carlisia Campos

Senior Member of Technical Staff, VMware
Carlisia works as a Senior Member of Technical Staff at VMware. She's a maintainer of the open source project Velero, a cloud native disaster recovery and data migration tool for Kubernetes workloads. She currently runs the San Diego Kubernetes meetup. Carlisia holds a MS in Computer... Read More →
avatar for Steven Wong

Steven Wong

Open Source Engineer, VMware
Steve Wong has been active in the Kubernetes and Apache Mesos communities since 2015. He is chair of the VMware SIG, and a co-organizer of the IoT and Edge Working Group on the Kubernetes project. He is a past speaker at KubeCon, MesosCon, Open Source Summit, SCALE, and meetups in... Read More →

Wednesday June 26, 2019 11:20 - 11:55


Storage Version Migrator: Never Worry About Stale API Objects Again - Chao Xu, Google
Have you ever had zombie Kubernetes API objects that your API server refuses to get, update or even delete? This is probably because the objects persisted in etcd were encoded in an obsolete version. This talk presents the Kubernetes storage version migrator, which solves this problem once and for all. After you enable this alpha feature, the migrator automatically makes sure that all your API objects stored in etcd are encoded in the proper version at all times. In the talk, Chao, a main contributor behind the design and the implementation of the storage version migrator, will share how the migrator manages not only the Kubernetes resources like Pods, but also your custom resources. You will also learn the caveats when using the migrator, e.g., if you are managing an HA cluster. Chao will also share the road map to graduate the storage migrator.

avatar for Chao Xu

Chao Xu

Software engineer, Google
Chao Xu has been a member of Kubernetes SIG apimachinery for more than 4 years. He is one of the top contributors, owning the garbage collector, admission webhooks, etc. Recently, Chao has been focusing on safe Kubernetes upgrades/downgrades. At his free time, Chao is a good table... Read More →

Wednesday June 26, 2019 12:05 - 12:40