Empirical Performance Analysis of HPC Applications with Portable Hardware Counter Metrics

Loading...
Thumbnail Image

Date

2022-10-04

Authors

Gravelle, Brian

Journal Title

Journal ISSN

Volume Title

Publisher

University of Oregon

Abstract

In this dissertation, we demonstrate that it is possible to develop methods of empirical hardware-counter-based performance analysis for scientific applications running on diverse CPUs. Although counters have been used in performance analysis for over 30 years, the methods remain limited to particular vendors or generations of CPUs. Our hypothesis is that counter-based measurements could be developed to provide consistent performance information on diverse CPUs. We prove the hypothesis correct by demonstrating one such set of metrics. We begin with an introduction and background discussing empirical performance analysis on CPUs. The background includes the Roofline Performance Model which is widely used to visualize the performance of scientific applications relative to the potential system performance. This model uses metrics that are portable to different CPU architectures, making it a useful starting point for efforts to develop portable hardware counter metrics. We contribute to existing roofline literature by presenting a method using counters to measure the required application data on two CPUs and by presenting benchmarks to produce the Roofline Model of the CPU. These contributions are complementary since the benchmarks can be used to validate the hardware counters used to measure the application data. We present a set of performance metrics derived from Hardware Performance Monitors that we have been able to replicate on CPUs from two vendors. We developed these metrics to focus on information that can inform developers about the performance of algorithms and data structures in applications. This method contrasts with other methods which are aimed at microarchitectural features and allows users to understand application performance from the same perspective on multiple CPUs. We use a series of case studies to explore the usefulness of our metrics and to validate that the measured values provide the expected information. The first set of studies examines benchmarks and mini-applications with a variety of performance. Finally, we study the performance of several versions of a scientific application using the Roofline Model and the new metrics. These case studies show that our performance metrics can provide performance information on two CPUs, proving our hypothesis by example. This dissertation includes previously published co-author material.

Description

Keywords

Citation