Shanghai, China
June 24–26, 2019
Click here for more information and registration

Simultaneous translation will be provided for all keynote and breakout sessions.

To view the Chinese version of this schedule please go here.

Venue + Sponsor Showcase Map
场馆 + 赞助商展示区地图

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

KC+CNC - Storage [clear filter]
Tuesday, June 25


Two Years with Vitess: How JD.com Runs the World's Largest Vitess - Xuhaihua & Jin Ke Xie , JD.com
JD.com serves 99% of China's consumers, we have always had to innovate on infrastructure in order to meet the demands of scale and speed.
In 2017, we adopted Vitess to help us scale MySQL. Two years on, JD.com now operates the world' largest Vitess on Kubernetes deployments. In this presentation, I will introduce how we use Vitess in JD.com. I will also share my personal understanding of Vitess in the Chinese context. Finally, I present a demo to show how we fully exploit the value of Vitess.
My presentation will demonstrate:
1. The value that Vitess brings to JD.com: how we use Vitess to reduce costs on machines and development
2. Frequently encountered problems: as one of the world's largest Vitess users, we want to share the problems we have encountered with Vitess and how to solve them
3. Practical examples in my demo: how to quickly built our projects with reusable Vitess modules

avatar for xuhaihua


Senior Research and Development Engineer, JD.com

Tuesday June 25, 2019 11:00 - 11:35


Embracing Big Data Workload in Cloud-Native Environment with Data Locality - Sammi Chen, Tencent & Xiaoyu Yao, Cloudera
Kubernetes support schedule workloads based on CPU and memory resource with node affinity, pod affinity and anti-affinity. This works very well for stateless workloads. For stateful workloads, especially big data workloads, scheduling compute close to data source can greatly boost performance, reliability and availability. However, in many cloud based storage systems, the data locality info is either unavailable or not exposed to container orchestra.

In this talk, we will first compare the data locality support from mainstream container attached storage for Kubernetes. Then we will introduce network topology support from Apache Hadoop Ozone and how to use it as locality aware container attached storage via Ozone CSI plugin for better workloads scheduling. Last, we will use Spark on K8s to demo the benefits of data locality aware scheduling with Apache Hadoop Ozone.

avatar for Sammi Chen

Sammi Chen

Software Engineer, Tencent
Sammi Chen is a software engineer at Tencent Cloud, working on Apache Hadoop HDFS and Ozone projects. She is a committer and PMC member of Apache Hadoop Projects.
avatar for Xiaoyu Yao

Xiaoyu Yao

Principal Software Engineer, Cloudera
Xiaoyu Yao is a principal software engineer at Cloudera Inc., working on Apache Hadoop HDFS and Ozone projects. He is a committer and PMC member of Apache Hadoop and Ratis Projects with 12 years of experience developing and supporting distributed storage and file system.

Tuesday June 25, 2019 11:45 - 12:20


Exploring High Availability in Kubernetes with Vitess - Jiten Vaidya, PlanetScale
As companies grow their infrastructure on the cloud in Kubernetes, questions of high availability arise. To be truly cloud native, you must be able to handle failure at any point in your stack.

Vitess has many features that can help in failure identification and handling, such as vtgate-as-proxy and data recovery through duplication (instead of backup).

Vitess also works with many third party tools like Prometheus for easy monitoring and observability.

In this talk, PlanetScale cofounder and CEO Jiten Vaidya will present the case for planning for failure. Using a live demo, Jiten will show how designing for high availability using Vitess allows you to prepare for risk, avert disasters, and recover from catastrophic failure.

avatar for Jiten Vaidya

Jiten Vaidya

CEO, Planetscale, Inc.
Jiten Vaidya is co-founder and CEO at PlanetScale (https://planetscale.com), a company that supports Vitess (https://vitess.io). For most of his career, he worked as a backend infrastructure engineer and manager at companies such as Dropbox, YouTube and Google. It was at YouTube... Read More →

Tuesday June 25, 2019 13:35 - 14:10


Rook Deployed Scalable NFS Clusters Exporting CephFS - Patrick Donnelly, Red Hat
Rook was developed as a storage provider for Kubernetes to automatically deploy and attach storage to pods. Significant effort within Rook has been devoted to integrating the open-source storage platform Ceph with Kubernetes. Ceph is a distributed storage system in broad use today that presents unified file, block, and object interfaces to applications.

This talk will present completed work in the Ceph Nautilus release to dynamically create highly-available and scalable NFS server clusters that export the Ceph file system (CephFS) for use within Kubernetes or as a standalone appliance. CephFS provides applications with a friendly programmatic interface for creating shareable volumes. For each volume, Ceph and Rook cooperatively manage the details of dynamically deploying a cluster of NFS-Ganesha pods with minimal operator or user involvement.


Patrick Donnelly

Senior Software Engineer, Red Hat, Inc.
Patrick Donnelly is a senior software engineer at Red Hat, Inc. currently leading the global development team working on the open-source Ceph distributed file system. Patrick has been a speaker at several events presenting recent work on Ceph, including Cephalocon APAC, various Openstack... Read More →

Tuesday June 25, 2019 14:20 - 14:55


HDFS CSI Plugin: Speed Up Kubernetes in On-Premises Big Data Cluster - Yi Chen & Junping Du, Tencent
Kubernetes not only becomes predominant in public cloud area these days, but also becomes a new trend in on-premises big data cluster environment, as an alternative of Hadoop YARN, a resource schedule component. In on-premise big data cluster, majority data are saved in HDFS. How to consume big data in HDFS with Kubernetes is a new challenge to users.
In the talk we will introduce our CSI compatible HDFS plugin design and architecture first. Then, we will share our best practices and knowledge about how big data workload Spark use HDFS CSI plugin to access HDFS data when running on K8s. In the end, the TPC-DS benchmark suite will be used to analysis performance comparison between Spark on K8s with HDFS and Spark on YARN with HDFS.


Junping Du

Architect, Tencent
Junping Du is chief architect for Tencent Cloud Big Data Department and responsible for cloud data warehouse engineering team. As Committer/PMC member, he serves as release manager of Hadoop 2.6.x and 2.8.x for Apache Hadoop community. Junping has more than 10 years industry experiences... Read More →

Yi Chen

Senior Software Engineer, Tencent
Yi Chen is a senior software engineer at Tencent Cloud, responsible for cloud data warehouse development. As a Hadoop committer/PMC member, she focuses on big data storage area, and also leads the Hadoop 2.9.1 release for Apache Hadoop community. Before joining Tencent, she was the... Read More →

Tuesday June 25, 2019 15:05 - 15:40