In this video from the UK HPC Conference, DK Panda from Ohio State University presents: Designing Scalable HPC, Deep Learning, Big Data, and Cloud Middleware for Exascale Systems.
This talk will focus on challenges in designing HPC, Deep Learning, Big Data and HPC Cloud middleware for Exascale systems with millions of processors and accelerators. For the HPC domain, we will discuss about the challenges in designing runtime environments for MPI+X (PGAS – OpenSHMEM/UPC/CAF/UPC++, OpenMP, and CUDA) programming models by taking into account support for multi-core systems (Xeon, ARM and OpenPower), high-performance networks, and GPGPUs (including GPUDirect RDMA). Features, sample performance numbers and best practices of using MVAPICH2 libraries will be presented. For the Deep Learning domain, we will focus on popular Deep Learning frameworks (Caffe, CNTK, and TensorFlow) to extract performance and scalability with MVAPICH2-GDR MPI library. For the Big Data domain, we will focus on high-performance and scalable designs of Spark and Hadoop (including HDFS, MapReduce, RPC, and HBase) and the associated Deep Learning frameworks using native RDMA support for InfiniBand and RoCE. Finally, we will outline the challenges in moving these middleware to the Azure and AWS cloud environments.