Shanghai, China
June 24–26, 2019
Click here for more information and registration

Simultaneous translation will be provided for all keynote and breakout sessions.

To view the Chinese version of this schedule please go here.

Venue + Sponsor Showcase Map
场馆 + 赞助商展示区地图

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

KC+CNC - Case Studies [clear filter]
Tuesday, June 25


Right-Sizing and Auto-Scaling of MySQL Containers in Kubernetes - Yuan Chen & Min Li, JD.com
JD.com runs large scale MySQL databases with Vitess on its Kubernetes platform in support of its internet scale e-commerce services. Right sizing and scaling of container resources is critical but difficult due to high uncertainty and variations in workloads.

This talk will present how JD develops optimized sizing and scaling techniques for improving performance and resource efficiency of MySQL clusters in Kubernetes. It will describe a system that combines statistical analysis, forecast and optimization algorithms to dynamically adjust containers' resource request&limit values and reschedule containers through Kubernetes and Vitess APIs to minimize resource usage while meeting QoS requirement. This system enables JD to manage its MySQL cluster resources much more flexibly and efficiently, and helps JD dramatically reduce operation and hardware costs of running MySQL in Kubernetes.

avatar for Yuan Chen

Yuan Chen

Principal Architect, JD.com
Yuan Chen is a Principal Architect at JD Silicon Valley R&D Center. He has 15+ years of research and industrial experience in the areas of large scale distributed systems, cloud computing and cluster management. His current work focuses on efficient resource management for cloud native... Read More →

Min Li

Staff Software Engineer, JD.com
Min Li is a staff software engineer at JD.com. Her main area is to apply big data analytics, machine learning and AI algorithms to optimize and automate resource management and operations in large scale distributed systems, especially in Kubernetes clusters. Before joining JD, she... Read More →

Tuesday June 25, 2019 11:00 - 11:35


Building and Managing Kubernetes with Kubernetes - Xin Ma, eBay
Kubernetes as a declarative and portable system can be used to do many things in different ways. At eBay we built a fleet management system based on k8s. Everything(server, subnet, OS, package and state) is declarative and can be modeled as CRDs in k8s, or referred to as a commit id in git from the objects. By running various controllers on top of these CRD objects, we use k8s to manage k8s, and the entire eBay data center.
- Our system provisions hosts the same way k8s creates and manages pods.
- We build k8s clusters with Salt. each host has a set of states defined in its salt CRD object. controllers pull states from git based on commit ids to apply.
- We build both schedulers and deployment transactions to manage the k8s clusters for both config deployments and upgrades.
This declarative, highly scalable, auto healing, and cloud native system is what we think can unify eBay’s fleet.

avatar for Xin Ma

Xin Ma

Principal Cloud Engineer, eBay
Lead DevOps engineer focusing on OS and Kernel, Container runtime, Kubernetes deployment and operations. Currently a member in the eBay Kubernetes team focusing on building and automating the eBay fleet with Kubnernetes. Before that Xin was in the eBay cloud team working on compute... Read More →

Tuesday June 25, 2019 11:45 - 12:20


Co-Location of CPU and GPU Workloads with High Resource Efficiency - Penghao Cen, Ant Financial & Jian He, Alibaba
Users run various workloads in Kubernetes including long running services and AI batch jobs. Normally, GPU machines are dedicated only for AI training and the resource utilization is low in some time.

Have you ever thought about co-locating different kinds of workloads on same node so you can save machines, aka money?

In this talk we will share experience and practices of leveraging co-location mechanism in Kubernetes cluster.

In detail:
Why & how we created a new QoS class from BestEffort?
Why & How we created node level cgroup for batch jobs?
How we use a CRD named PodGroup to achieve gang scheduling?
How we do the utilization evaluation?

In the past months, we build a co-location cluster which has more than 100 GPU (NVIDIA Tesla P100) nodes and more than 500 CPU nodes. We co-deployed both long-running services and AI batch jobs and achieved utilization increase of 10%.


Jian He

Staff Engineer, Alibaba
Jian He is a Staff Engineer at Alibaba where he works on a container infrastructures to support Alibaba massive workloads globally. Prior to that, he worked at Hortonworks Hadoop team, and primarily contributes to Hadoop YARN open source community where he has led many major features... Read More →
avatar for Penghao Cen

Penghao Cen

Senior Engineer, Ant Financial
Penghao Cen is a Senior Engineer at Ant Financial (originated from Alipay). He is currently an active contributor/member in Kubernetes and Kubeflow community focussing on resource management and scheduling. He primarily contributes to kubeflow/tf-operator project(Tools for MachineLearning/Tensorflow... Read More →

Tuesday June 25, 2019 13:35 - 14:10


Adapt to Unified and Pluggable Cluster Management Platform at LinkedIn - Tengfei Mu & Abin Shahab, LinkedIn
RAIN is a cluster resource management system developed at LinkedIn. It manages resources for tens of thousands of hosts per cluster in multiple datacenters including Azure to support scheduling both long running and batch jobs. It is integrated with existing LinkedIn cluster management ecosystem.

The goal for our next generation cluster management system is to support heterogeneous compute workloads quickly to improve developer productivity and server utilizations. We have evaluated and decided to adopt K8s' declarative API and extensible architecture. The adoption process has quite a few challenges for integrating with existing ecosystem at LinkedIn scale.

We first give an overview of LinkedIn cluster management ecosystem. Then we talk about our evaluation process and adoption challenges. We will then share lessons we learned during production and integration process.


Abin Shahab

Staff Software Engineer, LinkedIn
Abin Shahab is a Staff Engineer at Linkedin working with data and search for more than a decade. Since 2014 he has been working on containers and containerizing big data workloads. He’s a contributor to Docker, runc, lxc, cadvisor(part of Kubelet), YARN’s container runtime, and... Read More →

Tengfei Mu

Engineering Manager, LinkedIn
Tengfei Mu is a Staff Engineering Manager in Foundation team at LinkedIn where he is responsible for leading and architecting next generation cluster management system. He is passionate about incremental adopting k8s ecosystem at LinkedIn. Before joining LinkedIn, he was Tech Lead... Read More →

Tuesday June 25, 2019 14:20 - 14:55


TiKV Best Practices - James Zhang, PingCAP
TiKV is an open source distributed transactional key-value database, also a sandbox project of Cloud Native Computing Foundation (CNCF). Built in Rust and powered by Raft, TiKV provides high availability, strong consistency, ACID compliance, and horizontal scalability.TiKV supports externally-consistent distributed transactions and also implements a coprocessor framework to support distributed computing.
In this talk, we will introduce some best practices of TiKV such as how to control data distribution and what is the recommended deployment in cross-DC scenarios. We will include data balancing topics like scale out and scale in, and how to control the speed of balancing. This talk will also show you how to identify a hotspot issue and how to fight with it. Last but not least, we will introduce how to fine-tune performance under different workloads.

avatar for James Zhang

James Zhang

TiKV Core Development Engineer, PingCAP
James Zhang, TiKV Core Development Engineer, Distributed Storage Expert, Author of _MariaDB Principles and Implementation_. He is mainly engaged in designing and developing large-scale distributed storage systems, with rich experience in the database industry and system tuning.

Tuesday June 25, 2019 15:05 - 15:40


Cost-Effective Scheduling of a Massive Number of Containers in Kubernetes - Yuan Chen, JD.com
JD runs one of the largest Kubernetes clusters in production in the world, supporting a wide range of workloads from e-commerce services to big data and machine learning jobs. The massive scale and complexity requires efficient scheduling to address the scalability and cost-effectiveness challenges.

JD’s Chief Architect, Haifeng Liu, will present how JD overcomes hurdles to improve its Kubernetes Clusters’ resource utilization and cost efficiency through advanced scheduling, including fine-grained monetization and monitoring of resource usage, machine learning-driven resource allocation, co-scheduling of mixed workloads and millisecond-level elastic scaling. Specifically, Haifeng will describe Archimedes - JD's Kubernetes scheduling system, and how it handled an extreme demand with $24.7 billion of transactions on JD's Kubernetes platform during JD's June 18 anniversary sale event.

avatar for Yuan Chen

Yuan Chen

Principal Architect, JD.com
Yuan Chen is a Principal Architect at JD Silicon Valley R&D Center. He has 15+ years of research and industrial experience in the areas of large scale distributed systems, cloud computing and cluster management. His current work focuses on efficient resource management for cloud native... Read More →

Tuesday June 25, 2019 16:00 - 16:35


Three Approaches to Speed up Image Distribution in Cloud Native Era - Jiang Yong, Alibaba
Have you ever bothered image distribution issues as cluster scale grows?

In this talk, we will share practise and lessons learned from improving image distribution efficiency at web-scale in Alibaba. According to different scenarios, we take advantage of different methods for image distribution. P2P-based distribution of CNCF/Dragonfly is the most straightforward way to ease registry's bandwidth and decrease distribution time. In addition, remote filesystem snapshotter in CNCF/containerd directly stores image remotely and makes container engine read image content via network, which hardly takes time for distribution. You will find that the second way relies on network stability most, then how about dynamically loading image from remote to local storage according to image content R/W request as a tradeoff? At last we will conclude how to choose your fittable way for image distribution.


Jiang Yong

Senior Software Engineer, Alibaba
Jiang Yong, Senior Software Engineer of Alibaba Cloud Container Platform, maintains million of containers in Alibaba. Passionate with Open Source and enjoy sharing technology.

Tuesday June 25, 2019 16:45 - 17:20


Panel Discussion: Leverage Cloud Native to Transform Your Enterprise – The China Region - Cheryl Hung, CNCF; Kevin Wang, Huawei; Xiang Li, Alibaba Cloud; Vivian Zhang, JD.com; & Cheng Yu, Qihoo360
Cloud Native is experiencing dramatic growth & achieving widespread support as the de facto standard platform across a variety of industries. K8s, containers & related cloud native tech & tools have the potential to transform the enterprise. From enabling enterprises to modernize legacy apps, to automated DevOps, to automated failure recovery & improved testing, the list of innovative dev & operational practices emerging based on cloud native is amazing & a tremendous oppty for enterprises.

Panel will bring together stakeholders from enterprise IT & open src vendors to discuss how the various facets of cloud native can dramatically transform the enterprise. Panelists will discuss the key innovations that are emerging from cloud native to drive more efficient dev. and improved standardized operational practices to accelerate the digital transformation & modernization of the enterprise.

avatar for Cheryl Hung

Cheryl Hung

Director of Ecosystem, Cloud Native Computing Foundation
Cheryl Hung is the Director of Ecosystem at the CNCF. Her mission is to increase the adoption of Kubernetes and cloud native by growing the community and advocating for end users. She founded and runs the Cloud Native London meetup. Previously Cheryl spent five years as a C++ engineer... Read More →

avatar for Zefeng(Kevin) Wang

Zefeng(Kevin) Wang

Principal Engineer, Huawei
Zefeng(Kevin) Wang is a Principal Engineer of the Cloud Native Team at Huawei. Currently working on Kubernetes, KubeEdge and Huawei Cloud container products. He is the lead of Huawei Kubernetes & Cloud Native open source team and co-founder of KubeEdge project.

Vivian Zhang

Product Manager, JD.COM
Liying (Vivian) Zhang is a product manager at JD.com. She works on various software systems and platforms for JD's online retail service, which serves over 300 million consumers. As a passionate proponent of open source and JD's liason to the CNCF community, Liying endeavors to drive... Read More →

Xiang Li

Senior Staff Engineer, Alibaba
Xiang is a Senior Staff Engineer of Alibaba. He works on Alibaba’s cluster management system and helps with Kubernetes adoption for the entire Alibaba group. Prior to Alibaba, Xiang led the Kubernetes upstream team at CoreOS. He is also the creator of etcd and Kubernetes operator... Read More →

Tuesday June 25, 2019 18:15 - 18:50
Wednesday, June 26


Container Technology Drives Windows Application Transformation - Huajun Gu, DaoCloud & Jason Huang, Microsoft
Windows containers are not new for Kubernetes. A lot of efforts have been made to ensure a hybrid Kubernetes cluster could be deployed. The session will be separated into three parts.
1) Lifting and Shifting -- Experience of migrate legacy applications into Windows container.
2) A hybrid Kubernetes cluster -- Legacy applications still can be governed by latest technologies.
3) Future challenges -- Potential workarounds for solving future challenges.


Huajun Gu

Software Architect, DaoCloud
Greg is a veteran who focus on managing servers in an efficient way. He tries to be "lazy". Because billg chooses lazy people to do hard jobs. A lazy person will find easy ways to do these hard jobs. After developing skills in Microsoft Products and PowerShell for seven years, he... Read More →

Jason Huang

Technical Solution Professional, Microsoft
Jason is focusing on cloud solutions and help enterprise on their digital transformation. He had four years windows server debugging experience focusing on blue screen analysis, rootkit analysis, performance tuning. After that he focused on hybrid cloud, office 365,cybersecurity... Read More →

Wednesday June 26, 2019 11:20 - 11:55


Performing Infrastructure Migrations at Scale - Melanie Cebula, Airbnb
Everyone is excited to adopt the latest technology trends, but how do you migrate from one technology to another when you have high availability requirements? How do you separate hype from a technology that actually works at your scale? This talk aims to solve both:

1. the business problems of justifying and resourcing infrastructure migrations and managing the migration lifecycle

2. the technical problems of initial prototyping, identifying and solving blockers and gotchas, giving best practices for rollout, and automating migrations with infrastructure as code.

This talk will be backed by our case study migrating hundreds of legacy services with strict latency and uptime requirements to Kubernetes at Airbnb.

avatar for Melanie Cebula

Melanie Cebula

Software Engineer, Airbnb
Melanie Cebula is a software engineer on the service orchestration team at Airbnb, where she empowers thousands of engineers to create and operate hundreds of production Kubernetes services. She's previously spoken about Airbnb's journey to microservices and developing Kubernetes... Read More →

Wednesday June 26, 2019 11:20 - 11:55


CITIC Bank's Containerized Exploration Road - Jia Xing, Alauda
This presentation will introduce CITIC Bank, one of China's largest commercial banks, in its valuable practice in building a container platform.
CITIC Bank and Alauda have completed the Cloud Native container PaaS platform based on DevOps, established a bank independent operating environment, and fully automated the application development and testing process. Double the product iteration cycle. In this presentation, we will elaborate on how K8s can help CITIC Bank achieve the following goals:
- Unified portal, unified user management, support for multi-tenancy and operational scenario
- Flexible scaling, grayscale publishing, enabling rapid iteration of applications
- Fully integrated, integrated tool chain
- Real-time monitoring of service and operational status, intelligent failure analysis
- Middleware application market construction


Jia Xing

Alauda Senior Solution Architect, Alauda
Alauda Senior Solution Architect Long-term focus on enterprise-level application scenarios, continuous delivery (DevOps), cloud computing, virtualization, enterprise application cloud migration, automated deployment and testing, operations automation, continuous integration, agile... Read More →

Wednesday June 26, 2019 12:05 - 12:40