Toptube Video Search Engine



Title:Alex Volkov - Scaling GPU workloads on datacenter clusters (Full Talk), at the ORNL CentOS Dojo
Duration:53:27
Viewed:398
Published:14-05-2019
Source:Youtube

Talk Overview : Graphical Processing Units (GPUs) are critical to modern HPC (high performance compute) and ML/DL (machine learning/deep learning) computing workloads. Requirements of engineers and scientists can easily scale to petaflops whereas the current state of the art GPU performance is in teraflops range. Continuing in the tradition of cluster computing GPUs are scaled to petaflops performance by using traditional technologies such as MPI (message passing interface), high performance interconnects (such as infiniband and RoCE), and RDMA (remote direct memory access). The presentation will explore the challenges involved with multi-node scaling and how containerization is helping manage the software complexities of running workloads on clusters. An overview will be presented of how to orchestrate multinode workflows using GPU hardware and MPI using containers. The containers technology focus in the presentation will be on docker, singularity, HPC resource schedulers such as SLURM/PBS/etc., and container orchestration platforms such as Kubernetes. From the CentOS Dojo at ORNL - https://wiki.centos.org/Events/Dojo/ORNL2019



SHARE TO YOUR FRIENDS


Download Server 1


DOWNLOAD MP4

Download Server 2


DOWNLOAD MP4

Alternative Download :