PMACS - Performance Monitoring and Analysis of Cluster Systems

The workshop will take place in conjunction with the Euro-Par conference in Göttingen, Germany, from August 26 to August 27, 2019.

About this workshop

For a long time, hardware performance monitoring was used on a small scale to measure and analyze data of single application runs in order to detect performance limitations caused by hardware and/or software. Monitoring the whole cluster system for observing hardware failures has been the duty of system administrators with emphasis on operating the system and changes in the system parameters. In recent years, many HPC providers have extended or replaced their monitoring system to additionally track performance data from hardware monitoring facilities and even from the applications. The analysis of the data provides deeper insight in resource utilization and the quality of software. In addition, system administrators use performance data to track the causes of system instabilities to specific user codes. Due to the diversity of HPC centers, many tailored solutions for collection, storage, evaluation and visualization exist today. The workshop wants to bring together developers and users of such infrastructure in order to find ways of collaboration and exchange ideas for further developments. The workshop covers multiple topics of interests for the Euro-Par conference like support environments, performance evaluation, data management and analytics.

Call for papers

Performance monitoring and analysis are widely used in data center operations. Depending on particular area of operation it includes tracking activities at hard- and/or software level. HPC centers use system monitoring for ensuring system stability and avoiding resource contention. In recent years, many centers have enhanced their monitoring infrastructure to gather more detailed data from all entities in the system including background services and even user applications. The inherent challenges are diverse: Hardware and software parameters need to be collected with low overhead at all nodes. Gathered metrics need to be transferred to data storage without causing congestion and inference on the network. The whole or just excerpts of the data need to be visualized for further understanding. Furthermore, results of interest and management actions might be derived through data analytics.

Paper contributions for this workshop should address at least one of the following topics:

The page limit is 12 pages, including all references and appendices, and the formatting must follow the LNCS guidelines. Please note that papers with less than 10 pages will not be published in the workshop proceedings according to ruling of the Euro-Par workshop organization. At least one author of each accepted paper must present the paper at the workshop (25m talk and 5m for discussion).

There is a blind review process for each submitted paper with at least three reviewers from the programme committee. The review process will use EasyChair for organization.

EasyChair website of the workshop

Workshop dates

Workshop agenda

Time Title and authors PDF
14:00 - 14:45 Keynote by Stephane Eranian (Google LLC, USA)  
14:45 - 15:15 MPCDF HPC Performance Monitoring System: Enabling Insight via Job-Specific Analysis by Luka Stanisic and Klaus Reuter (Max Planck Computing and Data Facility (MPCDF))  PDF
15:15 - 15:30  Survey with the workshop attendees PDF
15:30 - 16:00 Coffee Break  
16:00 - 16:30 Sparse Grid Regression for Performance Prediction Using High-Dimensional Run Time Data by Philipp Neumann (DKRZ, German Climate Computing Center) PDF
16:30 - 17:00 Towards a Predictive Energy Model for HPC Runtime Systems Using Supervised Learning by Gence Ozer, Sarthak Garg, Neda Davoudi, Gabrielle Poerwawinata, Matthias Maiterth, Alessio Netti and Daniele Tafani (TU Munich, LRZ Munich, Intel Corp.) PDF
17:00 - 17:30 Resource Aware Scheduling for EDARegression Jobs by Saurav Nanda, Ganapathy Parthasarathy, Parivesh Choudhary and Arun Venkatachar (Synopsys Inc.) PDF

Chairs

Thomas Gruber

During his apprenticeship at the Erlangen Regional Computing Center (RRZE), the IT service provider for the Friedrich-Alexander-University Erlangen-Nuernberg (FAU), Thomas Gruber (né Röhl) collected experience with all kinds of clustering approaches. Afterwards, he studied Computer Science at RWTH Aachen University with emphasis on parallel programming and operating system kernel development. At the same time, he worked as a research assistant for the HPC group of the RWTH IT center. After receiving his M. Sc. degree, he went back to RRZE to work for the HPC group. Thomas Gruber leads the development of the performance tool suite LIKWID, which comprises easy-to-use tools for hardware performance monitoring, affinity control and micro-benchmarking. He also works on projects involving monitoring and analysis of hardware performance data.

Anthony Danalis

The research interests of Anthony Danalis span several topics within the broad domain of performance, ranging from performance measurement and evaluation, to compiler optimization, to novel programming paradigms that enable better utilization of modern hardware. His PhD research focused on compiler optimizations for improving the performance of MPI applications. Later, as a research associate at the Oak Ridge National Lab (ORNL) and the University of Tennessee Knoxville (UTK) he was a founding member of the GPU Benchmark suite SHOC, the CPU benchmark suite BlackjackBench, and the task scheduling runtime PaRSEC. Currently, Anthony Danalis is a research director with the Innovative Computing Laboratory (ICL) at the University of Tennessee and his research focuses on various extensions of the Performance Application Programming Interface (PAPI), which aim to (a) improve understanding of hardware counters, and (b) extend the notion of performance events to include not only hardware but also software-based information—all through one consistent interface.

Programme committee

Contact

If you want to contact us, please write to Thomas DOT Gruber AT fau DOT de