Semi-automatic assessment of I/O behavior by inspecting the individual client-node timelines— an explorative study on 10^6 jobs

[thumbnail of main.pdf]
Preview
Text - Accepted Version
· Please see our End User Agreement before downloading.
| Preview

Please see our End User Agreement.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Betke, E. and Kunkel, J. (2020) Semi-automatic assessment of I/O behavior by inspecting the individual client-node timelines— an explorative study on 10^6 jobs. In: ISC HPC, 21-25 Jun 2020, Frankfurt, Germany, pp. 166-184. doi: 10.1007/978-3-030-50743-5_9

Abstract/Summary

HPC applications with suboptimal I/O behavior interfere with well-behaving applications and lead to increased application runtime. In some cases, this may even lead to unresponsive systems and unfinished jobs. HPC monitoring systems can aid users and support staff to identify problematic behavior and support optimization of problematic applications. The key issue is how to identify relevant applications? A profile of an application doesn’t allow to identify problematic phases during the execution but tracing of each individual I/O is too invasive. In this work, we split the execution into segments, i.e., windows of fixed size and analyze profiles of them. We develop three I/O metrics to identify three relevant classes of inefficient I/O behaviors, and evaluate them on raw data of 1,000,000 jobs on the supercomputer Mistral. The advantages of our method is that temporal information about I/O activities during job runtime is preserved to some extent and can be used to identify phases of inefficient I/O. The main contribution of this work is the segmentation of time series and computation of metrics (Job-I/O-Utilization, Job-I/O-Problem-Time, and Job-I/O-Balance) that are effective to identify problematic I/O phases and jobs.

Altmetric Badge

Item Type Conference or Workshop Item (Paper)
URI https://reading-clone.eprints-hosting.org/id/eprint/89639
Identification Number/DOI 10.1007/978-3-030-50743-5_9
Refereed Yes
Divisions Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science
Download/View statistics View download statistics for this item

Downloads

Downloads per month over past year

University Staff: Request a correction | Centaur Editors: Update this record

Search Google Scholar