Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory (NVM)

Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory (NVM)

Thursday, May 23
2:50 PM - 3:30 PM
Marquis Salon 14

Advanced Big Data Processing frameworks have been proposed to harness the fast data transmission capability of Remote Direct Memory Access (RDMA) over high-speed networks such as InfiniBand, RoCEv1, RoCEv2, iWARP, and OmniPath. However, with the introduction of the Non-Volatile Memory (NVM) and NVM express (NVMe) based SSD, these designs along with the default Big Data processing models need to be re-assessed to discover the possibilities of further enhanced performance. In this talk, we will present, NRCIO, a high-performance communication runtime for non-volatile memory over modern network interconnects that can be leveraged by existing Big Data processing middleware. We will show the performance of non-volatile memory-aware RDMA communication protocols using our proposed runtime and demonstrate its benefits by incorporating it into a high-performance in-memory key-value store, Apache Hadoop, Tez, Spark, and TensorFlow. Evaluation results illustrate that NRCIO can achieve up to 3.65x performance improvement for representative Big Data processing workloads on modern data centers.

講演者

Dhabaleswar K (DK) Panda
Professor and University Distinguished Scholar
The Ohio State University
Dr. Dhabaleswar K. (DK) Panda is a Professor and University Distinguished Scholar of Computer Science at the Ohio State University. He obtained his Ph.D. in computer engineering from the University of Southern California. His research interests include parallel computer architecture, high-performance computing, communication protocols, big data, deep learning, files systems, network-based computing, and Quality of Service. He has published over 450 papers in major journals and international conferences related to these research areas. Dr. Panda and his research group members have been doing extensive research on modern networking technologies including InfiniBand, Omni-Path, High-Speed Ethernet and RDMA over Converged Enhanced Ethernet (RoCE). His research group is currently collaborating with National Laboratories and leading InfiniBand and Ethernet/iWARP companies on designing various subsystems of next-generation high-end systems. The MVAPICH2 (High-Performance MPI over InfiniBand, iWARP, and RoCE) open-source software package, developed by his research group, are currently being used by more than 2,925 organizations worldwide (in 86 countries). This software has enabled several InfiniBand clusters (including the 1st one) to get into the latest TOP500 ranking. These software packages are also available with the Open Fabrics stack for network vendors (InfiniBand and iWARP), server vendors and Linux distributors. The new RDMA-enabled Apache Hadoop and Memcached packages, consisting of acceleration for HDFS, MapReduce, RPC and Memcached, are publicly available from http://hibd.cse.ohio-state.edu. Dr. Panda's research is supported by funding from US National Science Foundation, US Department of Energy, and several industry including Intel, Cisco, SUN, Mellanox, QLogic, NVIDIA and NetApp. He is an IEEE Fellow and a member of ACM. More details about Dr. Panda, including a comprehensive CV and publications are available at http://web.cse.ohio-state.edu/~panda.2/.
Xiaoyi Lu
Research Assistant Professor
The Ohio State University
Dr. Xiaoyi Lu is a Research Assistant Professor in the Department of Computer Science and Engineering at the Ohio State University, USA. His current research interests include high performance interconnects and protocols, Big Data, Hadoop/Spark/Memcached Ecosystem, Parallel Computing Models (MPI/PGAS), Virtualization, Cloud Computing, and Deep Learning. He has published over 100 papers in International journals and conferences related to these research areas. He has been actively involved in various professional activities (PC Co-Chair, PC Member, and Reviewer) in academic journals and conferences. Recently, Dr. Lu is leading the research and development of RDMA-based accelerations for Apache Hadoop, Spark, HBase, and Memcached, and OSU HiBD micro-benchmarks, which are publicly available from (http://hibd.cse.ohio-state.edu). These libraries are currently being used by more than 290 organizations from 34 countries. More than 27,700 downloads of these libraries have taken place from the project site. He is a core member of the MVAPICH2 (High-Performance MPI over InfiniBand, Omni-Path, Ethernet/iWARP, and RoCE) project and he is leading the research and development of MVAPICH2-Virt (high-performance and scalable MPI for hypervisor and container based HPC cloud). He is a member of IEEE and ACM. More details about Dr. Lu are available at http://web.cse.ohio-state.edu/~lu.932/.