User-Level I/O Accelerations for High-Performance Deep Learning Applications

2021 Computer science

Author : Yue Zhu
Release : 2021
Genre : Computer science
Kind : eBook
Book Rating : /5 ( reviews)

Book Synopsis User-Level I/O Accelerations for High-Performance Deep Learning Applications by : Yue Zhu

Download or read book User-Level I/O Accelerations for High-Performance Deep Learning Applications written by Yue Zhu. This book was released on 2021. Available in PDF, EPUB and Kindle. Book excerpt: With the popularity of microprocessors and scale-out system architectures, many large-scale high-performance computing (HPC) systems are built from a collection of compute servers, with an identical set of resources such as CPU, memory, and storage. A variety of applications have been leveraging the tremendous computation capacity on these large-scale HPC systems. Scientific applications and deep learning (DL) training are two of the popular workloads on HPC systems. However, with the rapid growth of the computation power, it has also become increasingly important to fill in the computation and I/O performance gap for these applications and workloads on HPC systems. In recent years, many research efforts have been made to explore user-level file systems on HPC systems for various workloads due to the flexibility of implementation and maintenance in user space. In particular, scientific applications which have two typical I/O patterns (checkpoint/restart and multi-dimensional I/O) have been able to utilize different specialized user-level file systems in a single job. However, non-trivial overheads can be introduced in such a method. We need to carefully review the overheads in order to mit- igate the performance degradation. In addition, the existing methods of using user-level file systems are not sufficient to meet the fundamental I/O needs of large-scale DL training on HPC systems. Firstly, in DL training, random samples are organized into batches to update model parameters in iterations. This is to avoid the model being biased by the input sequences' noise, which allows faster convergence speed and reduces memory consumption during the training computation. This results in massive random reads for data shuffling across the entire datasets on storage systems. Such a random read I/O pattern is significantly different from the traditional scientific workloads. Moreover, leadership HPC systems are often equipped with a large pool of burst buffers in the form of flash or non-volatile memory (NVM) devices. DL applica- tions running on these systems face the resource underutilization problem. This is because NVM devices' performance with respect to low latency and high bandwidth can be severely underutilized under heavy CPU and memory workloads. In this environment, the flash or NVMe storage devices are capable of low-latency and high-bandwidth I/O services, but the complex software stack significantly hampers such capabilities for I/O processing in the kernel. Also, due to DL training accuracy and performance concerns, the storage capacity and the performance of on-node storage devices on the nodes allocated to the training job are not sufficient to store an entire dataset and match the training speed, respectively.This dissertation focus on applying user-level file systems on HPC systems. Our overarching goal is to accelerate the I/O supports on HPC systems through specialized user-level file systems for popular workloads. In specific, we want to bring lightweight user-level file systems as efficient intermediates to reduce the performance overhead and ease the use of multiple FUSE file systems in a single job, orchestrate the data movement between storage tiers and DL applications, and improve the storage resource utilization for a pool of NVMe SSDs in DL training. Based on these design goals, we investigate the issues and challenges when applying existing user-level file systems to the popular workloads, then propose three strategies to meet our goals. Firstly, we have studied the problem of excessive cost in crossing the user-kernel boundary when using multiple traditional user-level file systems, and we design Direct-FUSE to support multiple FUSE file sys- tems as well as other, custom user-level file systems in user space without the need to cross the user/kernel boundary into the FUSE kernel module. All layers of Direct-FUSE are in user space, and applications can directly use pre-defined unified file system calls to interact with different user-defined file systems. Our performance results show that Direct-FUSE can outperform some native FUSE file systems and does not add significant overhead over backend file systems. Secondly, we examine the I/O patterns of deep neural networks and study the performance overheads when loading samples from some popular DL applications. Then, we introduce an entropy-aware I/O framework called DeepIO for large-scale deep learning on HPC systems. It coordinates the use of memory, communication, and I/O resources for efficient training of datasets. DeepIO features an I/O pipeline that utilizes several novel optimizations: RDMA (Remote Direct Memory Access)-assisted in-situ shuffling, input pipelining, and entropy-aware opportunistic ordering. It outperforms the state-of-the-art persistent memory based distributed file systems for efficient sample load- ing during DL training. Thirdly, besides examining the I/O patterns of deep neural networks, we also reveal a critical need of loading many small samples randomly and the issues of storage resources underutilization for successful training. Based on these understandings, we design a specialized Deep Learning File System (DLFS) with an in-memory tree-based sample directory for metadata management and user-level storage disaggregation through the SPDK protocol. Our experimental results show that DLFS can dramatically improve the throughput of training for deep neural networks when compared with the kernel-based local Ext4 file system. Furthermore, DLFS demonstrates its capability of achieving efficient user-level storage disaggregation with very little CPU utilization. In conclusion, the first branch concentrates on enriching the functionality and enhancing the performance of the Direct-FUSE framework; the second and third branches focus on wisely storing and prefetching datasets with the coordination of hierarchical storage tiers and fast interconnect, respectively. By exploring these three branches, we can further accelerate the specialized user-level file systems for popular workloads on HPC systems.

High-Performance Big Data Computing

2022-08-02 Computers

Author : Dhabaleswar K. Panda
Release : 2022-08-02
Genre : Computers
Kind : eBook
Book Rating : 427/5 ( reviews)

GET EBOOK

Book Synopsis High-Performance Big Data Computing by : Dhabaleswar K. Panda

Download or read book High-Performance Big Data Computing written by Dhabaleswar K. Panda. This book was released on 2022-08-02. Available in PDF, EPUB and Kindle. Book excerpt: An in-depth overview of an emerging field that brings together high-performance computing, big data processing, and deep lLearning. Over the last decade, the exponential explosion of data known as big data has changed the way we understand and harness the power of data. The emerging field of high-performance big data computing, which brings together high-performance computing (HPC), big data processing, and deep learning, aims to meet the challenges posed by large-scale data processing. This book offers an in-depth overview of high-performance big data computing and the associated technical issues, approaches, and solutions. The book covers basic concepts and necessary background knowledge, including data processing frameworks, storage systems, and hardware capabilities; offers a detailed discussion of technical issues in accelerating big data computing in terms of computation, communication, memory and storage, codesign, workload characterization and benchmarking, and system deployment and management; and surveys benchmarks and workloads for evaluating big data middleware systems. It presents a detailed discussion of big data computing systems and applications with high-performance networking, computing, and storage technologies, including state-of-the-art designs for data processing and storage systems. Finally, the book considers some advanced research topics in high-performance big data computing, including designing high-performance deep learning over big data (DLoBD) stacks and HPC cloud technologies.

High Performance Computing

2023-09-25 Computers

Author : Amanda Bienz
Release : 2023-09-25
Genre : Computers
Kind : eBook
Book Rating : 438/5 ( reviews)

GET EBOOK

Book Synopsis High Performance Computing by : Amanda Bienz

Download or read book High Performance Computing written by Amanda Bienz. This book was released on 2023-09-25. Available in PDF, EPUB and Kindle. Book excerpt: This volume constitutes the papers of several workshops which were held in conjunction with the 38th International Conference on High Performance Computing, ISC High Performance 2023, held in Hamburg, Germany, during May 21–25, 2023. The 49 revised full papers presented in this book were carefully reviewed and selected from 70 submissions. ISC High Performance 2023 presents the following workshops: 2nd International Workshop on Malleability Techniques Applications in High-Performance Computing (HPCMALL) 18th Workshop on Virtualization in High-Performance Cloud Computing (VHPC 23) HPC I/O in the Data Center (HPC IODC) Workshop on Converged Computing of Cloud, HPC, and Edge (WOCC’23) 7th International Workshop on In Situ Visualization (WOIV’23) Workshop on Monitoring and Operational Data Analytics (MODA23) 2nd Workshop on Communication, I/O, and Storage at Scale on Next-Generation Platforms: Scalable Infrastructures First International Workshop on RISC-V for HPC Second Combined Workshop on Interactive and Urgent Supercomputing (CWIUS) HPC on Heterogeneous Hardware (H3)

Applied Machine Learning and High-Performance Computing on AWS

2022-12-30 Computers

Author : Mani Khanuja
Release : 2022-12-30
Genre : Computers
Kind : eBook
Book Rating : 445/5 ( reviews)

GET EBOOK

Book Synopsis Applied Machine Learning and High-Performance Computing on AWS by : Mani Khanuja

Download or read book Applied Machine Learning and High-Performance Computing on AWS written by Mani Khanuja. This book was released on 2022-12-30. Available in PDF, EPUB and Kindle. Book excerpt: Build, train, and deploy large machine learning models at scale in various domains such as computational fluid dynamics, genomics, autonomous vehicles, and numerical optimization using Amazon SageMaker Key FeaturesUnderstand the need for high-performance computing (HPC)Build, train, and deploy large ML models with billions of parameters using Amazon SageMakerLearn best practices and architectures for implementing ML at scale using HPCBook Description Machine learning (ML) and high-performance computing (HPC) on AWS run compute-intensive workloads across industries and emerging applications. Its use cases can be linked to various verticals, such as computational fluid dynamics (CFD), genomics, and autonomous vehicles. This book provides end-to-end guidance, starting with HPC concepts for storage and networking. It then progresses to working examples on how to process large datasets using SageMaker Studio and EMR. Next, you'll learn how to build, train, and deploy large models using distributed training. Later chapters also guide you through deploying models to edge devices using SageMaker and IoT Greengrass, and performance optimization of ML models, for low latency use cases. By the end of this book, you'll be able to build, train, and deploy your own large-scale ML application, using HPC on AWS, following industry best practices and addressing the key pain points encountered in the application life cycle. What you will learnExplore data management, storage, and fast networking for HPC applicationsFocus on the analysis and visualization of a large volume of data using SparkTrain visual transformer models using SageMaker distributed trainingDeploy and manage ML models at scale on the cloud and at the edgeGet to grips with performance optimization of ML models for low latency workloadsApply HPC to industry domains such as CFD, genomics, AV, and optimizationWho this book is for The book begins with HPC concepts, however, it expects you to have prior machine learning knowledge. This book is for ML engineers and data scientists interested in learning advanced topics on using large datasets for training large models using distributed training concepts on AWS, deploying models at scale, and performance optimization for low latency use cases. Practitioners in fields such as numerical optimization, computation fluid dynamics, autonomous vehicles, and genomics, who require HPC for applying ML models to applications at scale will also find the book useful.

Deep Learning with JAX

2024-10-29 Computers

Author : Grigory Sapunov
Release : 2024-10-29
Genre : Computers
Kind : eBook
Book Rating : 880/5 ( reviews)

GET EBOOK

Book Synopsis Deep Learning with JAX by : Grigory Sapunov

Download or read book Deep Learning with JAX written by Grigory Sapunov. This book was released on 2024-10-29. Available in PDF, EPUB and Kindle. Book excerpt: Accelerate deep learning and other number-intensive tasks with JAX, Google’s awesome high-performance numerical computing library. The JAX numerical computing library tackles the core performance challenges at the heart of deep learning and other scientific computing tasks. By combining Google’s Accelerated Linear Algebra platform (XLA) with a hyper-optimized version of NumPy and a variety of other high-performance features, JAX delivers a huge performance boost in low-level computations and transformations. In Deep Learning with JAX you will learn how to: • Use JAX for numerical calculations • Build differentiable models with JAX primitives • Run distributed and parallelized computations with JAX • Use high-level neural network libraries such as Flax • Leverage libraries and modules from the JAX ecosystem Deep Learning with JAX is a hands-on guide to using JAX for deep learning and other mathematically-intensive applications. Google Developer Expert Grigory Sapunov steadily builds your understanding of JAX’s concepts. The engaging examples introduce the fundamental concepts on which JAX relies and then show you how to apply them to real-world tasks. You’ll learn how to use JAX’s ecosystem of high-level libraries and modules, and also how to combine TensorFlow and PyTorch with JAX for data loading and deployment. Purchase of the print book includes a free eBook in PDF and ePub formats from Manning Publications. About the technology Google’s JAX offers a fresh vision for deep learning. This powerful library gives you fine control over low level processes like gradient calculations, delivering fast and efficient model training and inference, especially on large datasets. JAX has transformed how research scientists approach deep learning. Now boasting a robust ecosystem of tools and libraries, JAX makes evolutionary computations, federated learning, and other performance-sensitive tasks approachable for all types of applications. About the book Deep Learning with JAX teaches you to build effective neural networks with JAX. In this example-rich book, you’ll discover how JAX’s unique features help you tackle important deep learning performance challenges, like distributing computations across a cluster of TPUs. You’ll put the library into action as you create an image classification tool, an image filter application, and other realistic projects. The nicely-annotated code listings demonstrate how JAX’s functional programming mindset improves composability and parallelization. What's inside • Use JAX for numerical calculations • Build differentiable models with JAX primitives • Run distributed and parallelized computations with JAX • Use high-level neural network libraries such as Flax About the reader For intermediate Python programmers who are familiar with deep learning. About the author Grigory Sapunov holds a Ph.D. in artificial intelligence and is a Google Developer Expert in Machine Learning. The technical editor on this book was Nicholas McGreivy. Table of Contents Part 1 1 When and why to use JAX 2 Your first program in JAX Part 2 3 Working with arrays 4 Calculating gradients 5 Compiling your code 6 Vectorizing your code 7 Parallelizing your computations 8 Using tensor sharding 9 Random numbers in JAX 10 Working with pytrees Part 3 11 Higher-level neural network libraries 12 Other members of the JAX ecosystem A Installing JAX B Using Google Colab C Using Google Cloud TPUs D Experimental parallelization

Popular eBooks

User-Level I/O Accelerations for High-Performance Deep Learning Applications

High-Performance Big Data Computing

High Performance Computing

Applied Machine Learning and High-Performance Computing on AWS

Deep Learning with JAX

You may also like...