Performance Observability and Monitoring of High Performance Computing with Microservices
Loading...
Date
2022-10-04
Authors
Ramesh, Srinivasan
Journal Title
Journal ISSN
Volume Title
Publisher
University of Oregon
Abstract
Traditionally, High Performance Computing (HPC) softwarehas been built and deployed as bulk-synchronous, parallel
executables based on the message-passing interface (MPI) programming model.
The rise of data-oriented computing paradigms and an explosion
in the variety of applications that need to be supported on HPC
platforms have forced a re-think of the appropriate programming and execution models to integrate this new functionality.
In situ workflows demarcate a paradigm shift in
HPC software development methodologies enabling
a range of new applications ---
from user-level data services to machine learning (ML) workflows that run
alongside traditional scientific simulations.
By tracing the evolution of HPC software developmentover the past 30 years, this dissertation identifies the key elements and trends
responsible for the emergence of coupled, distributed, in situ workflows.
This dissertation's focus is on coupled in situ workflows
involving composable, high-performance microservices. After outlining the motivation
to enable performance observability of these services and why
existing HPC performance tools and techniques can not be applied in this context, this dissertation
proposes a solution wherein a set of techniques gathers, analyzes, and orients performance data from
different sources to generate observability. By leveraging microservice components initially designed
to build high performance data services,
this dissertation demonstrates their broader applicability for building and deploying performance
monitoring and visualization as services within an in situ workflow.
The results from this dissertation suggest that: (1) integration of
performance data from different sources is vital to understanding the performance
of service components, (2) the in situ (online) analysis of this performance data
is needed to enable the adaptivity of distributed components and manage monitoring data volume, (3) statistical modeling combined
with performance observations can help generate better service configurations, and (4) services are a promising
architecture choice for deploying in situ performance monitoring and visualization functionality.
This dissertation includes previously published and co-authored material and unpublished co-authored material.
Description
Keywords
HPC, Microservices, Monitoring, Observability, Performance, Tools