Linux-basierter Parallelrechner / HPC-Cluster Alex (Megware)

Model: GPU-Cluster 2022

Manufacturer: Megware (2022)

URL: https://hpc.fau.de/systems-services/systems-documentation-instructions/clusters/alex-cluster/

Location: Erlangen

Usage: For external users too

Organisation(s):

Regionales Rechenzentrum Erlangen (RRZE) Professur für Höchstleistungsrechnen

Funding Sources:

Bayerisches Staatsministerium für Bildung und Kultus, Wissenschaft und Kunst (ab 10/2013) NHR-Bund-Länder-Förderung Bundesministerium für Forschung, Technologie und Raumfahrt (BMFTR) NHR-Bund-Länder-Förderung Deutsche Forschungsgemeinschaft (DFG) DFG - Infrastrukturförderung (INFRA)

Involved Person(s):

Gerhard Wellein Thomas Zeiser
close-button

Types of publications

Journal article
Book chapter / Article in edited volumes
Authored book
Translation
Thesis
Edited Volume
Conference contribution
Other publication type
Unpublished / Preprint

Publication year

From
To

Abstract

Journal

Surface tension and viscosity of carbon dioxide near the critical point using molecular dynamics simulations and surface light scattering (2026) Sanchouli N, Kankanamge CJ, Fröba AP, Koller TM Journal article, Original article Decoding pH-Driven Phase Transition of Lipid Nanoparticles (2026) Trollmann M, Böckmann R Journal article, Original article AMD-HookNet++: Evolution of AMD-HookNet with Hybrid CNN-Transformer Feature Enhancement for Glacier Calving Front Segmentation (2025) Wu F, Dreier MN, Gourmelon N, Wind S, Zhang J, Seehaus T, Braun M, et al. Journal article, Original article IceAnatomy: a benchmark dataset and methodology for automatic ice boundary extraction from radio-echo sounding data (2025) Dreier MN, Koch M, Gourmelon N, Blindow N, Steinhage D, Wu F, Seehaus T, et al. Journal article, Online publication Thermophysical Properties of n-Hexane under the Influence of Dissolved Hydrogen by Experiments and Equilibrium Molecular Dynamics Simulations (2025) Damp P, Sun Y, Hewa Kankanamge CJK, Jander JH, Rausch MH, Klein T, Koller TM, Fröba AP Journal article, Original article Fick Diffusion Coefficients of Polystyrene Oligomers with Dissolved Blowing Agents by Dynamic Light Scattering and Molecular Dynamics Simulations (2025) Schmidt P, Hewa Kankanamge CJK, Klose J, Jander JH, Vergadou N, Economou I, Klein T, Fröba AP Journal article, Original article Comparative Molecular Dynamics Study of 19 Bovine Antibodies with Ultralong CDR H3 (2025) Denysenko O, Horn A, Sticht H Journal article Prediction of Fick Diffusion Coefficients in Binary Electrolyte Mixtures (2025) Hewa Kankanamge CJK, Zosel AI, Klein T, Fröba AP Journal article, Original article Zero-Shot Paragraph-level Handwriting Imitation with Latent Diffusion Models (2025) Mayr M, Dreier MN, Kordon F, Seuret M, Zöllner J, Wu F, Maier A, Christlein V Journal article, Letter Image Generation Diversity Issues and How to Tame Them (2025) Dombrowski MN, Zhang W, Cechnicka S, Reynaud H, Kainz B Conference contribution, Original article














Debug: Alles

name_de: Linux-basierter Parallelrechner / HPC-Cluster Alex
name_en: Linux-basierter Parallelrechner / HPC-Cluster Alex
model: GPU-Cluster 2022
url: https://hpc.fau.de/systems-services/systems-documentation-instructions/clusters/alex-cluster/
manufacturer: Megware
year: 2022
location_de: Erlangen
location_en: Erlangen
usage_de: Auch für externe Nutzer
usage_en: For external users too
description_de:
description_en: <p>FAU’s <strong>Alex cluster</strong> (system integrator: <a href="https://www.megware.com/">Megware</a>) is a high-performance compute resource with Nvidia GPGPU accelerators and partially high speed interconnect. It is intended for single and multi GPGPU workloads, e.g. from molecular dynamics, or machine learning. Alex serves for both, FAU’s basic Tier3 resources as well as NHR’s project resources.</p> <ul><li><strong>2 front end nodes</strong>, each with two AMD EPYC 7713 “Milan” processors (64 cores per chip) running at 2.0 GHz with 256 MB Shared L3 cache per chip, 512 GB of RAM, and 100 GbE connection to RRZE’s network backbone but no GPGPUs.</li><li><strong>8 GPGPU nodes</strong>, each with two AMD EPYC 7662 “Rome” processors (64 cores per chip) running at 2.0 GHz with 256 MB Shared L3 cache per chip, 512 GB of DDR4-RAM, <strong>four Nvidia A100 (each 40 GB HBM2 @ 1,555 GB/s; DGX board with NVLink; 9.7 TFlop/s in FP64 or 19.5 TFlop/s in FP32)</strong>, one HDR200 Infiniband HCAs, 25 GbE, and 6 TB on local NVMe SSDs. <em>(During the year 2021 and early 2022, these nodes have previously been part of TinyGPU.)</em></li><li><strong>20 GPGPU nodes</strong>, each with two AMD EPYC 7713 “Milan” processors (64 cores per chip) running at 2.0 GHz with 256 MB Shared L3 cache per chip, 1,024 GB of DDR4-RAM, <strong>eight Nvidia A100 (each 40 GB HBM2 @ 1,555 GB/s; HGX board with NVLink; 9.7 TFlop/s in FP64 or 19.5 TFlop/s in FP32)</strong>, two HDR200 Infiniband HCAs, 25 GbE, and 14 TB on local NVMe SSDs.</li><li><strong>12 GPGPU nodes</strong>, each with two AMD EPYC 7713 “Milan” processors (64 cores per chip) running at 2.0 GHz with 256 MB Shared L3 cache per chip, 2,048 GB of DDR4-RAM, <strong>eight Nvidia A100 (each 80 GB HBM2 @ 1,555 GB/s; HGX board with NVLink; 9.7 TFlop/s in FP64 or 19.5 TFlop/s in FP32)</strong>, two HDR200 Infiniband HCAs, 25 GbE, and 14 TB on local NVMe SSDs.<strong></strong></li><li><strong>38 GPGPU nodes</strong>, each with two AMD EPYC 7713 “Milan” processors (64 cores per chip) running at 2.0 GHz with 256 MB Shared L3Cache per chip, 512 GB of DDR4-RAM, <strong>eight Nvidia A40 (each with 48 GB DDR6 @ 696 GB/s; 37.42 TFlop/s in FP32)</strong>, 25 GbE, and 7 TB on local NVMe SSDs.</li></ul><p></p>
feature_de:
feature_en:
pictures: <QuerySet []>
cards: <QuerySet [<Card: Card of Gerhard, Wellein: (True)>, <Card: Card of Thomas, Zeiser: (True)>]>
funding_sources: <QuerySet [<FundingSource: FundingSource: cris_id: 139454349, name: Bayerisches Staatsministerium für Bildung und Kultus, Wissenschaft und Kunst (ab 10/2013), abbreviation: >, <FundingSource: FundingSource: cris_id: 139454146, name: Bundesministerium für Forschung, Technologie und Raumfahrt (BMFTR), abbreviation: BMBF>, <FundingSource: FundingSource: cris_id: 139453943, name: Deutsche Forschungsgemeinschaft (DFG), abbreviation: DFG>, <FundingSource: FundingSource: cris_id: 139457887, name: DFG - Infrastrukturförderung (INFRA), abbreviation: INFRA>]>
projects: <QuerySet [<Project: Medical Image Analysis with Normative Machine Learning (ERC-CoG MIA-NORMAL), ERC-CoG MIA-NORMAL, , , <p>As one of the most important aspects of diagnosis, treatment planning, treatment delivery, and follow-up, medical imaging provides an unmatched ability to identify disease with high accuracy. As a result of its success, referrals for imaging examinations have increased significantly. However, medical imaging depends on interpretation by highly specialised clinical experts and is thus rarely available at the front-line-of-care, for patient triage, or for frequent follow-ups. Very often, excluding certain conditions or confirming physiological normality would be essential at many stages of the patient journey, to streamline referrals and relieve pressure on human experts who have limited capacity. Hence, there is a strong need for increased imaging with automated diagnostic support for clinicians, healthcare professionals, and caregivers.<br /><br />Machine learning is expected to be an algorithmic panacea for diagnostic automation. However, despite significant advances such as Deep Learning with notable impact on real-world applications, robust confirmation of normality is still an unsolved problem, which cannot be addressed with established approaches.<br /><br />Like clinical experts, machines should also be able to verify the absence of pathology by contrasting new images with their knowledge about healthy anatomy and expected physiological variability. Thus, the aim of this proposal is to develop normative representation learning as a new machine learning paradigm for medical imaging, providing patient-specific computational tools for robust confirmation of normality, image quality control, health screening, and prevention of disease before onset. We will do this by developing novel Deep Learning approaches that can learn without manual labels from healthy patient data only, applicable to cross-sectional, sequential, and multi-modal data. Resulting models will be able to extract clinically useful and actionable information as early and frequent as possible during patient journeys.<br /></p>, , 2023-09-01, 2028-09-30, , 2028-09-30, Third party funded individual grant, True>, <Project: Unraveling the Membrane-Selective Mechanism of Lugdunin: Insights into Specificity for Gram-Positive Bacteria and Antimicrobial Activity, , , , , , 2025-03-01, 2028-02-29, , 2028-02-29, Third party funded individual grant, True>, <Project: Fracture across Scales: Integrating Mechanics, Materials Science, Mathematics, Chemistry, and Physics (FRASCAL) (GRK 2423 FRASCAL), GRK 2423 FRASCAL, https://www.frascal.research.fau.eu/, , <p>The RTG aims to improve understanding of fracture in brittle heterogeneous materials by developing simulation methods able to capture the multiscale nature of failure. With i) its rooting in different scientific disciplines, ii) its focus on the influence of heterogeneities on fracture at different length and time scales as well as iii) its integration of highly specialised approaches into a “holistic” concept, the RTG addresses a truly challenging cross-sectional topic in mechanics of materials. Although various simulation approaches describing fracture exist for particular types of materials and specific time and length scales, an integrated and overarching approach that is able to capture fracture processes in different – and in particular heterogeneous – materials at various length and time resolutions is still lacking. Thus, we propose an RTG consisting of interdisciplinary experts from mechanics, materials science, mathematics, chemistry, and physics that will develop the necessary methodology to investigate the mechanisms underlying brittle fracture and how they are influenced by heterogeneities in various materials. The insights obtained together with the methodological framework will allow tailoring and optimising materials against fracture. The RTG will cover a representative spectrum of brittle materials and their composites, together with granular and porous materials. We will study these at length and time scales relevant to science and engineering, ranging from sub-atomic via atomic and molecular over mesoscale to macroscopic dimensions. Our modelling approaches and simulation tools are based on concepts from quantum mechanics, molecular mechanics, mesoscopic approaches, and continuum mechanics. These will be integrated into an overall framework which will represent an important step towards a virtual laboratory eventually complementing and minimising extensive and expensive experimental testing of materials and components. Within the RTG, young researchers under the supervision of experienced PAs will perform cutting-edge research on challenging scientific aspects of fracture. The RTG will foster synergies in research and advanced education and is intended to become a key element in FAU‘s interdisciplinary research areas “New Materials and Processes” and “Modelling–Simulation–Optimisation”.<br /></p>, <p>The RTG (Research Training Group) aims to improve understanding of fracture in brittle heterogeneous materials by developing simulation methods that are  able to capture the multiscale nature of failure. With i) its rooting in different scientific disciplines, ii) its focus on the influence of heterogeneities on fracture at different length and time scales as well as iii) its integration of highly specialised approaches into a “holistic” concept, the RTG addresses a truly challenging cross-disciplinary topic in mechanics of materials.</p>, 2019-01-01, 2023-06-30, 2027-12-31, 2027-12-31, Third Party Funds Group - Overall project, True>, <Project: Die Rolle der Mincle-Fc?R-Wechselwirkung in der Regulation der autoantikörperabhängigen Entzündung, , , , , , 2024-01-01, 2026-12-31, , 2026-12-31, Third party funded individual grant, True>, <Project: International Doctoral Program: Measuring and Modelling Mountain glaciers and ice caps in a Changing Climate (M³OCCA) (MOCCA), MOCCA, , , <p>Mountain glaciers and ice caps outside the large ice sheets of Greenland and Antarctica contribute about 41% to the global sea level rise between 1901 to 2018 (IPCC 2021). While the Arctic ice masses are and will remain the main contributors to sea level rise, glacier ice in other mountain regions can be critical for water supply (e.g. irrigation, energy generation, drinking water, but also river transport during dry periods). Furthermore, retreating glaciers also can cause risks and hazards by floods, landslides and rock falls in recently ice-free areas. As a consequence, the Intergovernmental Panel of Climate Change (IPCC) dedicates special attention to the cryosphere (IPCC 2019; IPCC 2021). WMO and UN have defined Essential Climate Variables (ECV) for assessing the status of the cryosphere and its changes. These ECVs should be measured regularly on large scale and are essential to constrain subsequent modelling efforts and predictions.<br />The proposed International Doctorate Program (IDP) “Measuring and Modelling Mountain glaciers and ice caps in a Changing ClimAte (M3OCCA)” will substantially contribute to improving our observation and measurement capabilities by creating a unique inter- and transdisciplinary research platform. We will address main uncertainties of current measurements of the cryosphere by developing new instruments and future analysis techniques as well as by considerably advancing geophysical models in glaciology and natural hazard research. The IDP will have a strong component of evolving techniques in the field of deep learning and artificial intelligence (AI) as the data flow from Earth Observation (EO) into modelling increases exponentially. IDP M3OCCA will become the primary focal point for mountain glacier research in Germany and educate emerging<br />talents with an interdisciplinary vision as well as excellent technical and soft skills. Within the IDP we combine cutting edge technologies with climate research. We will develop future technologies and transfer knowledge from other disciplines into climate and glacier research to place Bavaria at the forefront in the field of mountain cryosphere research. IDP M3OCCA fully fits into FAU strategic goals and it will leverage on Bavaria’s existing long-term commitment via the super test site Vernagtferner in the Ötztal Alps run by Bavarian Academy of Sciences (BAdW). In addition, we cooperate with the University of Innsbruck and its long-term observatory at Hintereisferner. At those super test sites, we will perform joint measurements, equipment tests, flight campaigns and cross-disciplinary trainings and exercises for our doctoral researchers. We leverage on existing<br />instrumentation, measurements and time series. Each of the nine doctoral candidates will be guided by interdisciplinary, international teams comprising university professors, senior scientists and emerging talents from the participating universities and external research organisations.<br /></p>, , 2022-06-01, 2026-05-31, , 2026-05-31, Third party funded individual grant, True>, <Project: DatenREduktion für Exascale- Anwendungen in der Fusionsforschung (DaREXA-F), DaREXA-F, , DatenREduktion für Exascale- Anwendungen in der Fusionsforschung, , , 2022-09-01, 2025-08-31, , 2025-08-31, Third Party Funds Group - Sub project, True>, <Project: Der skalierbare Strömungsraum (StroemungsRaum), StroemungsRaum, , Der skalierbare Strömungsraum, <p>Kommende Exascale-Rechnerarchitekturen werden sich durch eine sehr hohe Zahl an heterogenen Hardware-Komponenten auszeichnen, die auch Spezialprozessoren bzw. Beschleuniger beinhalten werden. Die entsprechende Realisierung von CFD-Anwendersoftware als zentrale Kernkomponente von heutigen Strömungssimulationen im industriellen Umfeld erfordert auf methodischer Seite hochskalierbare Verfahren, vor allem zum Lösen der hochdimensionalen und instationären (nicht)linearen Gleichungssysteme, die zusätzlich in der Lage sein müssen, die hohe Peak Performance von Beschleuniger-Hardware algorithmisch auszunutzen. Zudem müssen diese Verfahrensansätze in der Anwendersoftware so realisiert werden, dass sie für reale Anwendungen, insbesondere bei der Simulation, Kontrolle und Optimierung von industrierelevanten Prozessen, von “Nicht-HPCExperten” verwendet werden und dabei ressourceneffizient die hohe Leistungsfähigkeit von zukünftigen Exascale-Rechnern ausnutzen können. </p><p>Die vor allem an der TU Dortmund entwickelte Open Source Software FEATFLOW ist ein leistungsstarkes CFD-Werkzeug und zentraler Teil der StrömungsRaum-Plattform, die von IANUS Simulation seit Jahren erfolgreich im industriellen Umfeld eingesetzt wird. Im Rahmen des Gesamtprojektes soll FEATFLOW methodisch und durch hardwarenahe parallele Implementierungen erweitert werden, so dass hochskalierbare CFD-Simulationen mit FEATFLOW auf zukünftigen Exascale-Architekturen möglich werden.</p><p>Im Teilprojekt der FAU werden Methoden und Prozesse des Performance Engineerings eingesetzt und weiterentwickelt, um zielgerichtet Hardwareeffizienz und Skalierung von FEATFLOW für die kommenden Klassen von HPC-Systemen und abzusehenden Exascale-Architekturen zu verbessern und damit die Simulationszeit stark zu verringern. Dabei werden insbesondere die im Rahmen des Projektes geplanten methodischen Erweiterungen bei der Implementierung effizienter Bibliotheken unterstützt. Darüber hinaus werden Performance Modelle für ausgewählte Kernroutinen erstellt, diese Routinen optimiert und deren effiziente Implementierung in Form von Proxy-Applikationen veröffentlicht.</p>, , 2022-09-01, 2025-08-31, , 2025-08-31, Third Party Funds Group - Sub project, True>, <Project: Tapping the potential of Earth Observations (TAPE), TAPE, , , , , 2019-04-01, 2021-03-31, 2022-03-31, 2022-03-31, FAU own research funding: EFI / IZKF / EAM ..., True>]>
publications: <QuerySet [<Publication: Surface tension and viscosity of carbon dioxide near the critical point using molecular dynamics simulations and surface light scattering>, <Publication: Decoding pH-Driven Phase Transition of Lipid Nanoparticles>, <Publication: AMD-HookNet++: Evolution of AMD-HookNet with Hybrid CNN-Transformer Feature Enhancement for Glacier Calving Front Segmentation>, <Publication: IceAnatomy: a benchmark dataset and methodology for automatic ice boundary extraction from radio-echo sounding data>, <Publication: Thermophysical Properties of n-Hexane under the Influence of Dissolved Hydrogen by Experiments and Equilibrium Molecular Dynamics Simulations>, <Publication: Fick Diffusion Coefficients of Polystyrene Oligomers with Dissolved Blowing Agents by Dynamic Light Scattering and Molecular Dynamics Simulations>, <Publication: Comparative Molecular Dynamics Study of 19 Bovine Antibodies with Ultralong CDR H3>, <Publication: Prediction of Fick Diffusion Coefficients in Binary Electrolyte Mixtures>, <Publication: Zero-Shot Paragraph-level Handwriting Imitation with Latent Diffusion Models>, <Publication: Image Generation Diversity Issues and How to Tame Them>, <Publication: Nuremberg Letterbooks: A Multi-Transcriptional Dataset of Early 15th Century Manuscripts for Document Analysis.>, <Publication: VR-Splatting: Foveated Radiance Field Rendering via 3D Gaussian Splatting and Neural Points>, <Publication: Lightweight cross-attention-based HookNet for historical handwritten document layout analysis>, <Publication: Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View>, <Publication: Deep learning-based debris flow hazard detection and recognition system: a case study>, <Publication: Inverse Rendering of Near-Field mmWave MIMO Radar for Material Reconstruction>, <Publication: Diffusion Coefficients in Electrolyte Mixtures─Influence of the Solute Characteristics>, <Publication: Causal reasoning in medical imaging>, <Publication: Data-efficient handwritten text recognition of diplomatic historical text>, <Publication: Mitochondria-Catalyzed Activation of Anticancer Prodrugs>, '...(remaining elements truncated)...']>
fobes: <QuerySet [<ResearchArea: Research Area: Title: Performance Engineering | Performance Engineering, Description: <p>Performance Engineering (PE) is a structured, model-based process for the structured optimization and parallelization of basic operations, algorithms and application codes for modern compute architectures. The process is divided into analysis, modeling and optimization phases, which are iterated for each homogeneous code section until an optimal or satisfactory performance is achieved. During the analysis, the first step is to develop a hypothesis about which aspect of the architecture (bottleneck) limits the execution speed of the software. The qualitative identification of typical bottlenecks can be done with so-called application-independent performance patterns. A concrete performance pattern is described by a set of observable runtime characteristics. Using suitable performance models, the interaction of the application with the given hardware architecture is then described analytically and quantitatively. </p><p>The model thus indicates the maximum expected performance and potential runtime improvements through appropriate modifications. If the model predictions cannot be validated by measurements, the underlying model assumptions are revisited and refined or adjusted if necessary. Based on the model, optimizations can be planned and their performance gain be assessed a-priori. The PE approach is not limited to standard microprocessor architectures and can also be used for projections to future computer architectures. The main focus of the group is on the computational node, where analytic performance models such as the Roofline model or the Execution Cache Memory (ECM) model are used.</p> | <p>Performance Engineering (PE) ist ein strukturierter, modellbasierter Prozess zur zielgerichteten Optimierung und Parallelisierung von Basisoperationen, Algorithmen und Anwenderprogrammen für moderne Hardwarearchitekturen. Der Prozess gliedert sich in Analyse-, Modellierungs- und Optimierungsphasen welche iterativ für jeden homogenen Codeabschnitt durchlaufen werden bis eine optimale bzw. zufriedenstellende Performance erreicht wird. Während der Analyse wird zunächst eine Hypothese erarbeitet welcher Aspekt der Architektur (Flaschenhals) die Ausführungsgeschwindigkeit der Software beschränkt. Die qualitative Identifikation typischer Flaschenhälse kann mit sogenannten anwendungsunabhängigen Performancemustern geschehen. Ein konkretes Performancemuster wird dabei durch spezielle Laufzeitcharakteristika beschrieben. Mit Hilfe geeigneter Performancemodelle wird anschließend die Wechselwirkung von Anwendung mit der gegebenen Hardwarearchitektur analytisch und quantitativ beschrieben. </p><p>Damit gibt das Modell die maximal zu erwartende Performance und mögliche Laufzeitverbesserungen durch entsprechende Modifikationen an. Können die Modellvorhersagen nicht durch Messungen validiert werden, so werden die zugrunde liegenden Modellannahmen überprüft und gegebenenfalls verfeinert oder angepasst. Auf Basis des Modells können Optimierungen geplant und deren Leistungsgewinn a-priori abgeschätzt werden. Der PE-Ansatz ist nicht auf klassische Mikroprozessorarchitekturen beschränkt und kann darüber hinaus auch für Projektionen für zukünftige Rechnerarchitekturen verwendet werden.  Die Arbeiten konzentrieren sich typischerweise auf den Rechenknoten, wo analytische Performancemodelle wie das Roofline-Model oder das in der Arbeitsgruppe entwickelte Execution-Cache-Memory (ECM) Modell zum Einsatz kommen.</p><p></p><p></p>, Classification: Field of Research | Forschungsbereich >, <ResearchArea: Research Area: Title: Performance Models | Performance Modellierung, Description: <p>Performance models describe the interaction between an application and the hardware, forming the basis for a deep understanding of the runtime behavior of an application. The group pursues an analytic approach, the essential components of which are application models and machine models. These components are initially created independently, but their combination and interaction finally provide insights about the bottlenecks and the expected performance. Especially the creation of accurate machine models requires a profound microarchitecture analysis. </p><p>The execution cache memory (ECM) developed by the group allows predictions of single-core performance as well as scaling within a multi-core processor or compute node. In combination with analytic models of electrical power consumption, it can also be used to derive estimates for the energy consumption of an application. The ECM model is a generalization of the well-known Roofline model. </p><p>Beyond the node level, the group investigates the performance of highly parallel MPI and hybrid applications, especially those without frequent synchronizing operations. Applications show highly dynamic behavior due to their interaction with the system's hardware bottlenecks, such as memory and network bandwidth. As a consequence, a simple additive combination of runtime models for the different phases of an application is often inaccurate. We extend existing node-level and communication models to describe effects like desynchronization, resynchronization, and idle wave propagation.</p> | Performancemodelle beschreiben die Interaktion zwischen einer Anwendung und der Hardware und bilden die Grundlage für ein tiefgreifendes Verständnis des Laufzeitverhaltens einer Anwendung. Die Gruppe verfolgt einen analytischen Ansatz, dessen wesentliche Komponenten Anwendungsmodelle und Maschinenmodelle sind. Diese Komponenten werden zunächst unabhängig voneinander erstellt, aber ihre Kombination und Interaktion liefern schließlich Erkenntnisse über die Engpässe und die zu erwartende Leistung. Insbesondere die Erstellung genauer Maschinenmodelle erfordert eine gründliche Analyse der Mikroarchitektur. <br /><br />Das von der Gruppe entwickelte Execution-Cache-Memory-Modell (ECM) ermöglicht Vorhersagen zur Single-Core-Leistung sowie zur Skalierung innerhalb eines Multi-Core-Prozessors oder Rechenknotens. In Kombination mit analytischen Modellen der Leistungsaufnahme kann es auch für Schätzungen des Energieverbrauchs einer Anwendung verwendet werden. Das ECM-Modell ist eine Verallgemeinerung des bekannten Roofline-Modells. <br /><br />Über die Knotenebene hinaus untersucht die Gruppe die Performance hochparalleler MPI- und Hybridanwendungen, insbesondere solcher ohne häufige Synchronisationsvorgänge. Anwendungen zeigen aufgrund ihrer Interaktion mit den Hardware-Flaschanhälsen des Systems (wie Speicher- und Netzwerkbandbreite) ein hochdynamisches Verhalten. Infolgedessen ist eine einfache additive Kombination von Laufzeitmodellen für die verschiedenen Phasen einer Anwendung oft ungenau. Wir erweitern bestehende Knoten- und Kommunikationsmodelle, um Effekte wie Desynchronisation, Resynchronisation und die Ausbreitung von Verzögerungswellen zu beschreiben.<br /><br /><br /><br />, Classification: Field of Research | Forschungsbereich >, <ResearchArea: Research Area: Title: Performance Tools | Performance Tools, Description: The group develops open-source software in the areas of performance tools, cluster monitoring, and benchmarking.<br />In the area of “performance tools,” the well-known LIKWID tool collection (https://github.com/RRZE-HPC/likwid) is being developed. It contains various tools for the controlled execution of applications on modern computing nodes with complex topology and adaptive runtime parameters. By measuring appropriate hardware metrics, LIKWID enables a detailed analysis of the hardware usage of application programs and is therefore of central importance for the validation of performance models and the identification of performance patterns. The output of derived metrics, such as the main memory bandwidth used, requires continuous adaptation and validation of this tool to new computer architectures.<br />The static code analysis tool OSACA (Open Source Architecture Code Analyzer) can analyze assembler code and provides a runtime prediction within the computing core (https://github.com/RRZE-HPC/OSACA).<br />With ClusterCockpit (https://clustercockpit.org/), the group is developing a comprehensive HPC cluster monitoring solution. ClusterCockpit comprises the following components: cc-metric-collector (node agent on the compute nodes), cc-backend (REST API and web server backend including web-based user interface), cc-metric-store (in-memory metric database), cc-energy-manager (job-specific control of power capping settings, global power capping for a cluster), and cc-node-controller (setting system parameters at the node level).   ClusterCockpit offers both job-centric and node-centric views and is accessible to regular HPC users, support staff, and administrators. ClusterCockpit is in productive use at a large number of HPC centers.<br />Benchmark applications are an important tool for understanding performance-limiting factors and exploring new optimization opportunities. They are used to characterize hardware platforms and in research and teaching. The group is developing “The Bandwidth Benchmark” (https://github.com/RRZE-HPC/TheBandwidthBenchmark), an application for measuring the maximum achievable bandwidth on all levels of the memory hierarchy. MD-Bench (https://github.com/RRZE-HPC/MD-Bench) implements state-of-the-art algorithms in the field of molecular dynamics for CPUs and GPUs, including scalable MPI parallelization. SparseBench implements solvers for sparse systems of equations. Different memory formats are supported. SparseBench is also MPI-parallel. MachineState (https://github.com/RRZE-HPC/MachineState) collects and stores all performance-related information at the node level, thus making an important contribution to reproducible benchmark results. | Die Gruppe entwickelt Open-Source-Software in den Themenfeldern Performance-Tools, Cluster-Monitoring und Benchmarking.<br />Im Bereich „Performance Tools” wird die bekannte Werkzeugsammlung LIKWID (https://github.com/RRZE-HPC/likwid) entwickelt. Sie enthält verschiedene Werkzeuge zur kontrollierten Ausführung von Applikationen auf modernen Rechenknoten mit komplexer Topologie und adaptiven Laufzeitparametern. Durch die Messung geeigneter Hardwaremetriken ermöglicht LIKWID eine detaillierte Analyse der Hardwarenutzung von Anwendungsprogrammen und ist somit von zentraler Bedeutung für die Validierung von Leistungsmodellen und die Identifizierung von Leistungsmustern. Die Ausgabe abgeleiteter Metriken, wie der genutzten Hauptspeicherbandbreite, erfordert eine kontinuierliche Anpassung und Validierung dieses Werkzeugs an neue Rechnerarchitekturen.<br />Das statische Code-Analysewerkzeug OSACA (Open Source Architecture Code Analyzer) kann Assemblercode analysieren und liefert eine Laufzeitvorhersage innerhalb des Rechenkerns (https://github.com/RRZE-HPC/OSACA).<br />Mit ClusterCockpit (https://clustercockpit.org/) entwickelt die Gruppe eine umfassende HPC-Cluster-Monitoring-Lösung. ClusterCockpit umfasst die folgenden Komponenten: cc-metric-collector (Knotenagent auf den Compute-Knoten), cc-backend (REST-API und Webserver-Backend inklusive webbasierter Benutzeroberfläche), cc-metric-store (In-Memory-Metric-Datenbank) und cc-energy-manager (jobspezifische Kontrolle von Powercapping-Einstellungen, globales Powercapping für einen Cluster) sowie cc-node-controller (Setzen von Systemparametern auf Knotenebene).   ClusterCockpit bietet sowohl jobzentrische als auch knotenzentrische Ansichten und ist für normale HPC-Nutzer, Supportpersonal und Administratoren zugänglich. ClusterCockpit ist an einer Vielzahl von HPC-Zentren im produktiven Einsatz.
Benchmark-Applikationen sind ein wichtiges Werkzeug, um leistungsbegrenzende Faktoren zu verstehen und neue Optimierungsmöglichkeiten zu erforschen.<br />Sie werden zur Charakterisierung von Hardwareplattformen sowie in Forschung und Lehre genutzt.<br />Die Gruppe entwickelt mit „The Bandwidth Benchmark” (https://github.com/RRZE-HPC/TheBandwidthBenchmark) eine Anwendung zur Messung der maximal erreichbaren der Bandbreite in allen Ebenen der Speicherhierarchie. MD-Bench (https://github.com/RRZE-HPC/MD-Bench) implementiert State-of-the-Art-Algorithmen im Bereich der Molekulardynamik für CPUs und GPUs, inklusive skalierbarer MPI-Parallelisierung. SparseBench implementiert Löser für dünnbesetzte Gleichungssysteme. Dabei werden unterschiedliche Speicherformate unterstützt. SparseBench ist ebenfalls MPI-parallel. MachineState (https://github.com/RRZE-HPC/MachineState) sammelt und speichert alle leistungsrelevanten Informationen auf Knotenebene und leistet somit einen wichtigen Beitrag zu reproduzierbaren Benchmark-Ergebnissen., Classification: Field of Research | Forschungsbereich >]>
orgas: <QuerySet [<Organisation: Regionales Rechenzentrum Erlangen (RRZE), Regionales Rechenzentrum Erlangen (RRZE), Erlangen, 91058, Martensstraße, 2999-12-31, Zentrale Einrichtungen, True>, <Organisation: Professur für Höchstleistungsrechnen, The research activities of the HPC professorship are located at the interface between numerical applications and modern parallel, heterogeneous high-performance computers.<br /><br />The application focus is on the development and implementation of hardware- and energy-efficient numerical methods and application programs. The foundation of all activities is a structured performance engineering (PE) process based on analytic performance models. Such models describe the interaction between software and hardware and are thus able to systematically identify efficient implementation, optimization and parallelization strategies. The PE process is applied to stencil-based schemes as well as basic operations and eigenvalue solvers for large sparse problems.<br /><br />In the computer science-oriented research focus, performance models, PE methods and easy-to-use open source tools are developed that support the process of performance engineering and performance modeling on the compute node level. We focus on the continuous development of the ECM performance model and the LIKWID tool collection.<br /><br />In teaching and training, the working group consistently relies on a model-based approach to teach optimization and parallelization techniques. The courses are integrated into the computer science and computational engineering curriculum at FAU. Furthermore, the group offers an internationally successful tutorial program on performance engineering and hybrid programming.<br /><br />Prof. Wellein is director of the Erlangen National Center for High-Performance  Computing (NHR@FAU) and is the spokesman of the Competence Network for Scientific High Performance Computing in Bavaria (KONWIHR)., Erlangen, 91058, Martensstraße, 2999-12-31, Department Informatik, True>]>