Parallel I/O characterization and evaluation on large scale HPC systems
Loading...
Files
Date
Authors
Ather, Hammad
Journal Title
Journal ISSN
Volume Title
Publisher
University of Oregon
Abstract
Recent technological advances have led to advanced parallel computing hardware and complex I/O workloads, comprising Machine Learning, Deep Learning, and other artificial intelligence techniques. These advances have made the existing parallel I/O stack more complex and challenging to tune which if not optimized properly, can lead to massive overheads and performance degradation. With these ever-increasing complexities of the I/O stack deployed on large-scale systems, one needs to have an in-depth understanding of the I/O behavior of these systems and be aware of the performance modeling and prediction tools required to evaluate and optimize I/O. Therefore, it is critical to have a comprehensive study that end users can use as a guide to evaluate and optimize parallel I/O in their applications. This paper presents such a study by surveying the current landscape of parallel I/O characterization and evaluation on large-scale HPC systems. By taking a deep di! ve into the different layers of the I/O stack, this paper shows how the different access patterns are shaped as an I/O request traverses down the I/O stack and what optimizations can be made to these access patterns. The paper also looks at different workload generation methodologies and the different profiling and tracing tools that can collect performance statistics for these workloads. It also discusses different parallel I/O evaluation techniques such as statistical analysis, machine learning, and replay-based modeling. Lastly, it ties this whole discussion with the current active area of research in parallel I/O: automatically evaluating, analyzing, and optimizing parallel I/O in applications without involving an I/O expert in the loop.
Description
23 pages
Keywords
Parallel I/O, Performance Evaluation, HPC Systems, I/O Characterization