Malony, AllenYokelson, Dewi2024-08-072024-08-072024-08-07https://hdl.handle.net/1794/29769The exascale computing era is providing faster and more powerful systems for advanced HPC applications. However, it is increasingly challenging for programmers to utilize the range of hardware resources that make up these platforms to their fullest extent. Enabling larger, faster, and more diversified simulations requires performance monitoring tools that can integrate seamlessly with applications and operate efficiently in all desired configurations. In addition to critical computational bottlenecks, data movement and I/O performance issues are also important to monitor as data can quickly grow to terabytes and beyond. Thus, a major challenge in high-performance computing is maximizing the performance of many diverse simulations on expensive, energy consuming, and heterogeneous hardware. Furthermore, the landscape of scientific simulations is changing to include increasingly diverse and complex systems, such as coupled applications and workflows. This creates additional considerations in the performance analysis space, where dependencies and task scheduling can play a larger role. This dissertation presents an approach to addressing these issues, wherein we enable performance observability during runtime for different applications and workflows running on heterogeneous architectures. The framework we have created to support this valuable functionality is called Service-based Observability, Monitoring, and Analytics (SOMA). We show how it addresses diverse application and workflow needs across systems, while supporting many useful performance monitoring capabilities with reasonable overhead.en-USAll Rights Reserved.high performance computingperformance monitoringONLINE PERFORMANCE OBSERVATION FOR HPC APPLICATIONSElectronic Thesis or Dissertation