Using HPX and LibGeoDecomp for Scaling HPC Applications on Heterogeneous Supercomputers

Heller T, Schäfer A, Fey D (2013)


Publication Type: Conference contribution

Publication year: 2013

Publisher: ACM

Edited Volumes: Proc. of ScalA 2013: Workshop on Latest Adv. in Scalable Algorithms for Large-Scale Systems - Held in Conjunction with SC 2013: The Int. Conf. for High Perform. Comput., Networking, Storage and Anal.

City/Town: Denver

Conference Proceedings Title: ScalA'13 proceedings

Event location: Denver

ISBN: 978-1-4503-2508-0

DOI: 10.1145/2530268.2530269.

Abstract

With the general availability of PetaFLOP clusters and the advent of heterogeneous machines equipped with special accelerator cards such as the Xeon Phi[2], computer scientist face the difficult task of improving application scalability beyond what is possible with conventional techniques and programming models today. In addition, the need for highly adaptive runtime algorithms and for applications handling highly inhomogeneous data further impedes our ability to efficiently write code which performs and scales well. In this paper we present the advantages of using HPX[19, 3, 29], a general purpose parallel runtime system for applications of any scale as a backend for LibGeoDecomp[25] for implementing a three-dimensional N-Body simulation with local interactions. We compare scaling and performance results for this application while using the HPX and MPI backends for LibGeoDecomp. LibGeoDecomp is a Library for Geometric Decomposition codes implementing the idea of a user supplied simulation model, where the library handles the spatial and temporal loops, and the data storage. The presented results are acquired from various homogeneous and heterogeneous runs including up to 1024 nodes (16384 conventional cores) combined with up to 16 Xeon Phi accelerators (3856 hardware threads) on TACC's Stampede supercomputer[1]. In the configuration using the HPX backend, more than 0.35 PFLOPS have been achieved, which corresponds to a parallel application efficiency of around 79%. Our measurements demonstrate the advantage of using the intrinsically asynchronous and message driven programming model exposed by HPX which enables better latency hiding, fine to medium grain parallelism, and constraint based synchronization. HPX's uniform programming model simplifies writing highly parallel code for heterogeneous resources. Copyright is held by the owner/author(s).

Authors with CRIS profile

How to cite

APA:

Heller, T., Schäfer, A., & Fey, D. (2013). Using HPX and LibGeoDecomp for Scaling HPC Applications on Heterogeneous Supercomputers. In ScalA'13 proceedings. Denver: Denver: ACM.

MLA:

Heller, Thomas, Andreas Schäfer, and Dietmar Fey. "Using HPX and LibGeoDecomp for Scaling HPC Applications on Heterogeneous Supercomputers." Proceedings of the Supercomputing 2013, Denver Denver: ACM, 2013.

BibTeX: Download