Das Gerät wurde im Februar 2026 nach fast 10 Jahren treuer Dienst sillgelegt.
description_en: <div>THE SYSTEM HAS BEEN DECOMISIONED IN FEBRUARY 2026 AFTER ALMOST 10 YEARS OF SUCCESSFUL OPERATION.</div><div><br /></div>The RRZE’s Meggie cluster (manufacturer: Megware) is a high-performance compute resource with high speed interconnect. It is intended for distributed-memory (MPI) or hybrid parallel programs with medium to high communication requirements.<br /><ul><li>728 compute nodes, each with two Intel Xeon E5-2630v4 „Broadwell“ chips (10 cores per chip) running at 2.2 GHz with 25 MB Shared Cache per chip and 64 GB of RAM.</li><li>2 front end nodes with the same CPUs as the compute nodes but 128 GB of RAM.</li><li>Lustre-based parallel filesystem with a capacity of almost 1 PB and an aggregated parallel I/O bandwidth of > 9000 MB/s.</li><li>Intel OmniPath interconnect with up to 100 GBit/s bandwidth per link and direction.</li><li>Measured LINPACK performance of ~481 TFlop/s.<br /></li></ul>Meggie is a system that is designed for running parallel programs using significantly more than one node.<br />
feature_de:
feature_en: <br /><p></p>
pictures: <QuerySet [<Picture: 234579709>]>
cards: <QuerySet [<Card: Card of Gerhard, Wellein: (True)>, <Card: Card of Thomas, Zeiser: (True)>]>
funding_sources: <QuerySet [<FundingSource: FundingSource: cris_id: 139453943, name: Deutsche Forschungsgemeinschaft (DFG), abbreviation: DFG>, <FundingSource: FundingSource: cris_id: 139457887, name: DFG - Infrastrukturförderung (INFRA), abbreviation: INFRA>]>
projects: <QuerySet [<Project: Deep-Learning-Informed Glacio-Hydrological Threat (DELIGHT Framework), DELIGHT Framework, , , , , 2024-09-01, 2030-08-31, , 2030-08-31, Third party funded individual grant, True>, <Project: Privacy-preserving analysis of distributed medical data, , , , <p>Recent legislative development, such as the European Health Data Space, expand access to anonymizied health data for various entities. While these advances offer opportunities for medical research and innovation, they also increase the risk of compromising individuals' privacy.<br /><br />This project addresses the critical tension between the growing utility of health data and the need to protect individual privacy through organizational, infrastructural, and technical approaches. A key component of the technical solutions is privacy-enhancing technologies (PETs), such as secure multi-party computation and (local) differential privacy, which safeguard individuals' privacy while enabling the statistical analysis of aggregate data.<br /></p>, , 2023-07-01, , , 2026-07-01, Internally funded project, True>, <Project: International Doctoral Program: Measuring and Modelling Mountain glaciers and ice caps in a Changing Climate (M³OCCA) (MOCCA), MOCCA, , , <p>Mountain glaciers and ice caps outside the large ice sheets of Greenland and Antarctica contribute about 41% to the global sea level rise between 1901 to 2018 (IPCC 2021). While the Arctic ice masses are and will remain the main contributors to sea level rise, glacier ice in other mountain regions can be critical for water supply (e.g. irrigation, energy generation, drinking water, but also river transport during dry periods). Furthermore, retreating glaciers also can cause risks and hazards by floods, landslides and rock falls in recently ice-free areas. As a consequence, the Intergovernmental Panel of Climate Change (IPCC) dedicates special attention to the cryosphere (IPCC 2019; IPCC 2021). WMO and UN have defined Essential Climate Variables (ECV) for assessing the status of the cryosphere and its changes. These ECVs should be measured regularly on large scale and are essential to constrain subsequent modelling efforts and predictions.<br />The proposed International Doctorate Program (IDP) “Measuring and Modelling Mountain glaciers and ice caps in a Changing ClimAte (M3OCCA)” will substantially contribute to improving our observation and measurement capabilities by creating a unique inter- and transdisciplinary research platform. We will address main uncertainties of current measurements of the cryosphere by developing new instruments and future analysis techniques as well as by considerably advancing geophysical models in glaciology and natural hazard research. The IDP will have a strong component of evolving techniques in the field of deep learning and artificial intelligence (AI) as the data flow from Earth Observation (EO) into modelling increases exponentially. IDP M3OCCA will become the primary focal point for mountain glacier research in Germany and educate emerging<br />talents with an interdisciplinary vision as well as excellent technical and soft skills. Within the IDP we combine cutting edge technologies with climate research. We will develop future technologies and transfer knowledge from other disciplines into climate and glacier research to place Bavaria at the forefront in the field of mountain cryosphere research. IDP M3OCCA fully fits into FAU strategic goals and it will leverage on Bavaria’s existing long-term commitment via the super test site Vernagtferner in the Ötztal Alps run by Bavarian Academy of Sciences (BAdW). In addition, we cooperate with the University of Innsbruck and its long-term observatory at Hintereisferner. At those super test sites, we will perform joint measurements, equipment tests, flight campaigns and cross-disciplinary trainings and exercises for our doctoral researchers. We leverage on existing<br />instrumentation, measurements and time series. Each of the nine doctoral candidates will be guided by interdisciplinary, international teams comprising university professors, senior scientists and emerging talents from the participating universities and external research organisations.<br /></p>, , 2022-06-01, 2026-05-31, , 2026-05-31, Third party funded individual grant, True>, <Project: Fractures across Scales: Integrating Mechanics, Materials Science, Mathematics, Chemistry, and Physics/ Skalenübergreifende Bruchvorgänge: Integration von Mechanik, Materialwissenschaften, Mathematik, Chemie und Physik (FRASCAL), FRASCAL, https://www.frascal.research.fau.eu/, , , , 2019-01-01, 2023-06-30, , 2023-06-30, Third party funded individual grant, True>, <Project: Tapping the potential of Earth Observations (TAPE), TAPE, , , , , 2019-04-01, 2021-03-31, 2022-03-31, 2022-03-31, FAU own research funding: EFI / IZKF / EAM ..., True>, <Project: Energy Oriented Center of Excellence: toward exascale for energy (Performance evaluation, modelling and optimization) (EoCoE-II), EoCoE-II, , Energy Oriented Center of Excellence: toward exascale for energy, , , 2019-01-01, 2021-12-31, , 2021-12-31, Third Party Funds Group - Sub project, True>, <Project: Cooperative Action of SNARE Peptides in Fusion (SFB1027), SFB1027, http://www.sfb1027.uni-saarland.de, SFB1027: Physikalische Modellierung von Nicht-Gleichgewichtsprozessen in biologischen Systemen (Universität des Saarlandes), <p>
<span style="font-family: ArialMT; font-size: 11pt;">SNARE peptides act cooperatively during synaptic vesicle fusion. It has previously been suggested that oligomerization of SNARE complexes, which is required for cooperative action in fusion is mediated by interactions between their transmembrane domains (TMDs) and further tuned by interactions with the lipid environment. In this project, the oligomerization of SNARE TMD peptides, their interaction with the lipid surrounding, and the peptide-induced membrane curvature and its influence on membrane fusion will be studied using molecular dynamics simu- lations. </span></p>, <p>
<span style="font-family: ArialMT; font-size: 14.666666984558105px;">SNARE peptides act cooperatively during synaptic vesicle fusion. It has previously been suggested that oligomerization of SNARE complexes, which is required for cooperative action in fusion is mediated by interactions between their transmembrane domains (TMDs) and further tuned by interactions with the lipid environment. In this project, the oligomerization of SNARE TMD peptides, their interaction with the lipid surrounding, and the peptide-induced membrane curvature and its influence on membrane fusion will be studied using molecular dynamics simu- lations. </span></p>, 2017-01-01, 2020-12-31, , 2020-12-31, Third Party Funds Group - Sub project, True>, <Project: Detection and Attribution of climate change for the mountain cryosphere: Advancing to the process-level, , , , , , 2017-08-01, 2020-07-31, , 2020-07-31, Third party funded individual grant, True>, <Project: Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen (SeASiTe), SeASiTe, , , <p>
Das Forschungsprojekt SeASiTe stellt sich der Aufgabe, eine systematische Untersuchung von Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen durchzuführen. Das Ziel ist der Entwurf und die Bereitstellung des Prototypen eines Werkzeugkastens, mit dessen Hilfe Programmierer ihre Anwendungen mit effizienten Selbstadaptionstechniken ausstatten können. Der Ansatz beinhaltet die Selbstadaption sowohl hinsichtlich relevanter System- und Programmparameter als auch möglicher Programmtransformationen.<br />
Die Optimierung der Programmausführung für mehrere nicht-funktionale Ziele (z.B. Laufzeit oder Energieverbrauch) soll auf einer Performance-Modellierung zur Eingrenzung des Suchraums effizienter Programmvarianten aufbauen. Anwendungsunabhängige Methoden und Strategien zur Selbstadaption sollen in einem Autotuning-Navigator gekapselt werden.<br /></p>
<p>
Das Erlanger Teilprojekt beschäftigt sich zunächst mit der modellbasierten Verständnis von Autotuning-Verfahren für reguläre Simulationsalgorithmen am Beispiel verschiedener gängiger Stencilklassen. Dabeisollen mit Hilfe erweiterter Performancemodelle strukturierte Richtlinien und Empfehlungen für den Autotuning-Prozess bzgl. relevanter Code-Transformationen und der Beschränkung des Suchraums für Optimierungsparameter erstellt und für den Autotuning-Navigator exemplarisch aufbereitet werden.<br />
Der zweite Schwerpunkt der Arbeiten besteht in der Erweiterung bestehender analytischer<br />
Performancemodelle und Software-Werkzeuge auf neue Rechnerarchitekturen und der Integration in den Autotuning-Navigator. Darüber hinaus betreut der Erlanger Gruppe den Demonstrator für Stencil-Codes.<br />
Die Gruppe wirkt weiters an der Auslegung des AT-Navigators und der Definition von Schnittstellen mit.<br />
</p>, , 2017-03-01, 2020-02-29, , 2020-02-29, Third party funded individual grant, True>, <Project: Metaprogrammierung für Beschleunigerarchitekturen (MeTacca), MeTacca, , , <p>
In Metacca wird das AnyDSL Framework zu einer homogenen Programmierumgebung für<br />
heterogene Ein- und Mehrknoten-Systeme ausgebaut. Hierbei wird die UdS den Compiler und das Typsystem von AnyDSL erweitern, um dem Programmierer das produktive Programmieren von Beschleunigern zu ermöglichen. Darauf aufbauend wird der LSS geeignete Abstraktionen für die Verteilung und Synchronisation auf Ein- und Mehrknoten-Rechnern in Form einer DSL in AnyDSL entwickeln. Alle Komponenten werden durch Performance Modelle (RRZE) unterstützt<br />
Eine Laufzeitumgebung mit eingebautem Performance-Profiling kümmert sich um Resourcenverwaltung und Systemkonfiguration. Das entstandene Framework wird anhand zweier Anwendungen, Ray-Tracing (DFKI) und Bioinformatik (JGU), evaluiert.<br />
Als Zielplattformen dienen Einzelknoten und Cluster mit mehreren Beschleunigern (CPUs, GPUs, Xeon Phi).</p>
<p>
</p>
<p>
Die Universität Erlangen-Nürnberg ist hauptverantwortlich für die Unterstützung von verteilter<br />
Programmierung (LSS) sowie für die Entwicklung und Umsetzung von unterstützenden Performance-Modellen sowie einer integrierten Profiling Komponente (RRZE). In beiden Teilbereichen wird zu Beginn eine Anforderungsanalyse durchgeführt um weitere Schritte zu planen und mit den Partnern abzustimmen.<br />
Der LSS wird im ersten Jahr die Verteilung der Datenstrukturen umsetzen. Im weiteren Verlauf wird sich die Arbeit auf die Umsetzung von Synchronisationsmechanismen konzentrieren. Im letzten Jahr werden Codetransformationen entworfen, um die Konzepte für Verteilung und Synchronisation in AnyDSL auf die gewählten Anwendungen anzupassen. Das RRZE wird in einem ersten Schritt das kerncraft Framework in die partielle Auswertung integrieren. Hierbei wird kerncraft erweitert um aktuelle Beschleunigerarchitekturen sowie Modelle für die Distributed-Memory-Parallelisierung zu unterstützen. In zwei weiteren Paketen wird eine Ressourcenverwaltung und eine auf LIKWID basierende Profiling Komponente umgesetzt</p>, , 2017-01-01, 2019-12-31, , 2019-12-31, Third party funded individual grant, True>, <Project: Process-Oriented Performance Engineering Service Infrastructure for Scientific Software at German HPC Centers (ProPE), ProPE, https://blogs.fau.de/prope/, , <p>
The ProPE project will deploy a prototype HPC user support<br />
infrastructure as a distributed cross-site collaborative effort of several<br />
tier-2/3 centers with complementing HPC expertise. Within ProPE<br />
code optimizing and parallelization of scientific software is seen as a<br />
structured, well-defined process with sustainable outcome. The<br />
central component of ProPE is the improvement, process-based<br />
implementation, and dissemination of a structured performance<br />
engineering (PE) process. This PE process defines and drives code<br />
optimization and parallelization as a target-oriented, structured<br />
process. Application hot spots are identified first and then<br />
optimized/parallelized in an iterative cycle: Starting with an analysis of<br />
the algorithm, the code, and the target hardware a hypothesis of the<br />
performance-limiting factors is proposed based on performance<br />
patterns and models. Performance measurements validate or guide<br />
the iterative adaption of the hypothesis. After validation of the<br />
hardware bottleneck, appropriate code changes are deployed and the<br />
PE cycle restarts. The level of detail of the PE process can be<br />
adapted to the complexity of the underlying problem and the<br />
experience of the HPC analyst. Currently this process is applied by<br />
experts and at the prototype level. ProPE will formalize and document<br />
the PE process and apply it to various scenarios (single core/node<br />
optimization, distributed parallelization, IO-intensive problems).<br />
Different abstraction levels of the PE process will be implemented and<br />
disseminated to HPC analysts and application developers via user<br />
support projects, teaching activities, and web documentation. The<br />
integration of the PE process into modern IT infrastructure across<br />
several centers with different HPC support expertise will be the<br />
second project focus. All components of the PE process will be<br />
coordinated and standardized across the partnering sites. This way<br />
the complete HPC expertise within ProPE can be offered as coherent<br />
service on a nationwide scale. Ongoing support projects can be<br />
transferred easily between participating centers. In order to identify<br />
low-performing applications, characterize application loads, and<br />
quantify benefits of the PE activities at a system level, ProPE will<br />
employ a system monitoring infrastructure for HPC clusters. This tool<br />
will be tailored to the requirements of the PE process and designed<br />
for easy deployment and usage at tier-2/3 centers. The associated<br />
ProPE partners will ensure the embedding into the German HPC<br />
infrastructure and provide basic PE expertise in terms of algorithmic<br />
choices, perfectly complementing the code optimization and<br />
parallelization efforts of ProPE.</p>, , 2017-01-01, 2019-12-31, , 2019-12-31, Third party funded individual grant, True>, <Project: TERRA-NEO - Integrated Co-Design of an Exascale Earth Mantle Modeling Framework (TERRA-NEO), TERRA-NEO, http://www.terraneo.fau.de, SPP 1648: Software for Exascale Computing, <p>
Much of what one refers to as geological activity of the Earth is due to the fact that heat is transported from the interior of our planet to the surface in a planetwide solid-state convection in the Earth’s mantle. For this reason, the study of the dynamics of the mantle is critical to our understanding of how the entire planet works. Processes from earthquakes, plate tectonics, crustal evolution to the geodynamo are governed by convection in the mantle. Without a detailed knowledge of Earth‘s internal dynamic processes, we cannot hope to deduce the many interactions between shallow and deep Earth processes that dominate the Earth system. The vast forces associated with mantle convection cells drive horizontal movement of Earth’s surface in the form of plate tectonics, which is well known albeit poorly understood. They also induce substantial vertical motion in the form of dynamically maintained topography that manifests itself prominently in the geologic record through sea level variations and their profound impact on the ocean and climate system. Linking mantle processes to their surface manifestations is seen widely today as one of the most fundamental problems in the Earth sciences, while being at the same time a matter of direct practical relevance through the evolution of sedimentary basins and their paramount economical importance.Simulating Earth mantle dynamics requires a resolution in space and time that makes it one of the grand challenge applications in the computational sciences. With exascale systems of the future it will be possible to advance beyond the deterministic forward problem to a stochastic uncertainty analysis for the inverse problem. In fact, fluid dynamic inverse theory is now at hand that will allow us to track mantle motion back into the past exploiting the rich constraints available from the geologic record, subject to the availability of powerful geodynamical simulation software that could take advantage of these future supercomputers.The new community code TERRA-NEO will be based on a carefully designed multi-scale spacetime discretization using hybridized Discontinuous Galerkin elements on an icosahedral mesh with block-wise refinement. This advanced finite element technique promises better stability and higher accuracy for the non-linear transport processes in the Earth mantle while requiring less communication in a massively parallel setting. The resulting algebraic systems with finally more than 1012 unknowns per time step will be solved by a new class of communication-avoiding, asynchronous multigrid preconditioners that will achieve maximal scalability and resource-optimized computational performance. A non-deterministic control flow and a lazy evaluation strategy will alleviate the traditional over-synchronization of hierarchical iterative methods and will support advanced resiliency techniques on the algorithmic level.The software framework of TERRA-NEO will be developed specifically for the upcoming heterogeneous exascale computers by using an advanced architecture-aware design process. Special white-box performance models will guide the software development leading to a holistic co-design of the data structures and the algorithms on all levels. With this systematic performance engineering methodology we will also optimize a balanced compromise between minimal energy consumption and shortest run time.This consortium is fully committed to the interdisciplinary collaboration that is necessary for creating TERRA-NEO as new exascale simulation framework. To this end, TERRA-NEO brings top experts together that cover all aspects of CS&E, from modeling via the discretization to solvers and software engineering for exascale architectures.</p>, , 2005-10-12, 2019-06-08, 2019-09-30, 2019-09-30, Third Party Funds Group - Sub project, True>, <Project: ESSEX - Equipping Sparse Solvers for Exascale, , , SPP 1648: Software for Exascale Computing, <p>
The ESSEX project investigates the computational issues arising for large scale sparse eigenvalue problems and develops programming concepts and numerical methods for their solution. The exascale challenges of extreme parallelism, energy efficiency, and resilience will be addressed by coherent software design between the three project layers which comprise building blocks, algorithms and applications. The MPI+X programming model, a holistic performance engineering strategy, and advanced fault tolerance mechanisms are the driving forces behind all developments. Classic Krylov, Jacobi-Davidson and recent FEAST methods will be enabled for exascale computing and equipped with advanced, scalable preconditioners. New implementations of domainspecific iterative schemes in physics and chemistry, namely the established Chebyshev expansion techniques for the computation of spectral properties and their novel extension to the time evolution of driven quantum systems, complement these algorithms.The software solutions of the ESSEX project will be combined into an Exascale Sparse Solver Repository (“ESSR”), where the specific demands of the quantum physics users are recognized by integration of quantum state encoding techniques at the fundamental level. The relevance of this project can then be demonstrated through application of the ESSR algorithms to graphene-based structures, topological insulators, and quantum Hall effect devices. Such studies require exascale resources together with modern numerical methods to determine many eigenstates at a given point of the spectrum of extremely large matrices or to compute an approximation to their full spectrum. The concepts, methods and software building blocks developed in the ESSEX project serve as general blueprints for other scientific application areas that depend on sparse iterative algorithms. The strong vertical interaction between all three project layers ensures that the user can quickly utilize any progress on the lower layers and immediately use the power of exascale machines once they become available.</p>, , 2012-11-01, 2019-06-30, , 2019-06-30, Third Party Funds Group - Sub project, True>, <Project: Equipping Sparse Solvers for Exascale II (ESSEX-II) (SPPEXA), SPPEXA, https://blogs.fau.de/essex/activities, SPP 1648: Software for Exascale Computing, <p>
The ESSEX-II project will use the successful concepts and software<br />
blueprints developed in ESSEX-I for sparse eigenvalue solvers to<br />
produce widely usable and scalable software solutions with high<br />
hardware efficiency for the computer architectures of the upcoming<br />
decade. All activities are organized along the traditional software<br />
layers of low-level parallel building blocks (kernels), algorithm<br />
implementations, and applications. However, the classic abstraction<br />
boundaries separating these layers are broken in ESSEX-II by<br />
strongly integrating objectives: scalability, numerical reliability, fault<br />
tolerance, and holistic performance and power engineering. Driven by<br />
Moores Law and power dissipation constraints, computer systems will<br />
become more parallel and heterogeneous even on the node level in<br />
upcoming years, further increasing overall system parallelism. MPI+X<br />
programming models can be adapted in flexible ways to the<br />
underlying hardware structure and are widely expected to be able to<br />
address the challenges of the massively multi-level parallel<br />
heterogeneous architectures of the next decade. Consequently, the<br />
parallel building blocks layer supports MPI+X, with X being a<br />
combination of node-level programming models able to fully exploit<br />
hardware heterogeneity, functional parallelism, and data parallelism.<br />
In addition, facilities for fully asynchronous checkpointing, silent data<br />
corruption detection and correction, performance assessment,<br />
performance model validation, and energy measurements will be<br />
provided. The algorithms layer will leverage the components in the<br />
building blocks layer to deliver fully heterogeneous, automatically<br />
fault-tolerant, and state-of-the-art implementations of Jacobi-Davidson<br />
eigensolvers, the Kernel Polynomial Method (KPM), and Chebyshev<br />
Time Propagation (ChebTP) that are ready to use for production on<br />
modern heterogeneous compute nodes with best performance and<br />
numerical accuracy. Chebyshev filter diagonalization (ChebFD) and a<br />
Krylov eigensolver complement these implementations, and the<br />
recent FEAST method will be investigated and further developed for<br />
improved scalability. The applications layer will deliver scalable<br />
solutions for conservative (Hermitian) and dissipative (non-Hermitian)<br />
quantum systems with strong links to optics and biology and to novel<br />
materials such as graphene and topological insulators. Extending its<br />
predecessor project, ESSEX-II adopts an additional focus on<br />
production-grade software. Although the selection of algorithms is<br />
strictly motivated by quantum physics application scenarios, the<br />
underlying research directions of algorithmic and hardware efficiency,<br />
accuracy, and resilience will radiate into many fields of computational<br />
science. Most importantly, all developments will be accompanied by<br />
an uncompromising performance engineering process that will<br />
rigorously expose any discrepancy between expected and observed<br />
resource efficiency.</p>, , 2016-01-01, 2018-12-31, , 2018-12-31, Third Party Funds Group - Sub project, True>, <Project: ExaStencils - Advanced Stencil-Code Engineering (ExaStencils), ExaStencils, http://www.exastencils.org, SPP 1648: Software for Exascale Computing, <p>
Future exascale computing systems with 10<sup>7</sup> processing units and supporting up to 10<sup>18</sup> FLOPS peak performance will require a tight co-design of application, algorithm, and architecture aware program development to sustain this performance for many applications of interest, mainly for two reasons. First, the node structure inside an exascale cluster will become increasingly heterogeneous, always exploiting the most recent available on-chip manycore/GPU/HWassist technology. Second, the clusters themselves will be composed of heterogeneous subsystems and interconnects. As a result, new software techniques and tools supporting the joint algorithm and architecture-aware program development will become indispensable not only (a) to ease application and program development, but also (b) for performance analysis and tuning, (c) to ensure short turn-around times, and (d) for reasons of portability.<br />
<br />
Project ExaStencils will investigate and provide a unique, tool-assisted, domain-specific codesign approach for the important class of stencil codes, which play a central role in high performance simulation on structured or block-structured grids. Stencils are regular access patterns on (usually multidimensional) data grids. Multigrid methods involve a hierarchy of very fine to successively coarser grids. The challenge of exascale is that, for the coarser grids, less processing power is required and communication dominates. From the computational algorithm perspective, domain-specific investigations include the extraction and development of suitable stencils, the analysis of performance-relevant algorithmic tradeoffs (e.g., the number of grid levels) and the analysis and reduction of synchronization requirements guided by a template model of the targeted cluster architecture. Based on this analysis, sophisticated programming and software tool support shall be developed by capturing the relevant data structures and program segments for stencil computations in a domain-specific language and applying a generator-based product-line technology to generate and optimize automatically stencil codes tailored to each application–platform pair. A central distinguishing mark of ExaStencils is that domain knowledge is being pursued in a coordinated manner across all abstraction levels, from the formulation of the application scenario down to the generation of highly-optimized stencil code.<br />
<br />
For the developed unique and first-time seamless cross-level design flow, the three objectives of (1) a substantial gain in productivity, (2) high flexibility in the choice of algorithm and execution platform, and (3) the provision of the ExaFLOPS performance for stencil code shall be demonstrated in a detailed, final evaluation phase.</p>, <p>
Future exascale computing systems with 10<sup>7</sup> processing units and supporting up to 10<sup>18</sup> FLOPS peak performance will require a tight co-design of application, algorithm, and architecture aware program development to sustain this performance for many applications of interest, mainly for two reasons. First, the node structure inside an exascale cluster will become increasingly heterogeneous, always exploiting the most recent available on-chip manycore/GPU/HWassist technology. Second, the clusters themselves will be composed of heterogeneous subsystems and interconnects. As a result, new software techniques and tools supporting the joint algorithm and architecture-aware program development will become indispensable not only (a) to ease application and program development, but also (b) for performance analysis and tuning, (c) to ensure short turn-around times, and (d) for reasons of portability.<br />
<br />
Project ExaStencils will investigate and provide a unique, tool-assisted, domain-specific codesign approach for the important class of stencil codes, which play a central role in high performance simulation on structured or block-structured grids. Stencils are regular access patterns on (usually multidimensional) data grids. Multigrid methods involve a hierarchy of very fine to successively coarser grids. The challenge of exascale is that, for the coarser grids, less processing power is required and communication dominates. From the computational algorithm perspective, domain-specific investigations include the extraction and development of suitable stencils, the analysis of performance-relevant algorithmic tradeoffs (e.g., the number of grid levels) and the analysis and reduction of synchronization requirements guided by a template model of the targeted cluster architecture. Based on this analysis, sophisticated programming and software tool support shall be developed by capturing the relevant data structures and program segments for stencil computations in a domain-specific language and applying a generator-based product-line technology to generate and optimize automatically stencil codes tailored to each application–platform pair. A central distinguishing mark of ExaStencils is that domain knowledge is being pursued in a coordinated manner across all abstraction levels, from the formulation of the application scenario down to the generation of highly-optimized stencil code.<br />
<br />
For the developed unique and first-time seamless cross-level design flow, the three objectives of (1) a substantial gain in productivity, (2) high flexibility in the choice of algorithm and execution platform, and (3) the provision of the ExaFLOPS performance for stencil code shall be demonstrated in a detailed, final evaluation phase.</p>, 2013-01-01, 2018-12-31, , 2018-12-31, Third Party Funds Group - Sub project, True>, <Project: EXASTEEL II - Bridging Scales for Multiphase Steels (SPPEXA), SPPEXA, http://www.numerik.uni-koeln.de/14079.html, SPP 1648: Software for Exascale Computing, <p>
In the EXASTEEL-2 project, experts on scalable iterative solvers, computational modeling in materials science, performance engineering, and parallel direct solvers are joining forces to develop new computational algorithms and implement software for a grand challenge problem from computational materials science.</p>
<p>
There is an increasing need for predictive simulations of the macroscopic behavior of complex new materials. In the EXASTEEL-2 project, this problem is considered for modern micro-heterogeneous (dual-phase) steels, attempting to predict the macroscopic properties of new materials from those on the microscopic level. It is the goal to develop algorithms and software towards a virtual laboratory for predictive material testing in silico. A bottleneck is the computational complexity of the multiscale models needed to describe the new materials, involving sufficiently accurate micromechanically motivated models on the crystalline scale. Therefore, new ultra-scalable nonlinear implicit solvers will be developed and combined with a highly parallel computational scale bridging approach (FE^2), intertwined with a consequent and permanent performance engineering, to bring the challenging engineering application of a virtual laboratory for material testing and design to extreme scale computing. We envisage a continuously increased transition from descriptive to predictive macroscopic simulations and take into account, to the best of our knowledge for the first time within a computational scale bridging approach, the polycrystalline nature of dual phase steels including grain boundary effects at the microscale.</p>
<p>
Our goals could not be reached without building on the algorithm and software infrastructure from EXASTEEL-1. We will complete the paradigm shift, begun in the EXASTEEL-1 project, from Newton-Krylov solvers to nonlinear methods (and their composition) with improved concurrency and reduced communication. By combining nonlinear domain decomposition with multigrid methods we plan to leverage the scalability of both implicit solver approaches for nonlinear methods.</p>
<p>
Although our application is specific, the algorithms and optimized software will have an impact well beyond the particular application. Nonlinear implicit solvers are at the heart of many simulation codes, and our software building blocks PETSc, BoomerAMG, PARDISO, and FEAP are all software packages with a large user base. The advancement of these software packages is explicitly planned for in the work packages of this project.</p>
<p>
The project thus adresses computational algorithms (nonlinear implicit solvers and scale bridging), application software, and programming (PE, hybrid programming, accelerators).</p>, , 2016-01-01, 2018-12-31, , 2018-12-31, Third Party Funds Group - Sub project, True>, <Project: GRK 1962: Dynamische Wechselwirkungen an Biologischen Membranen – von Einzelmolekülen zum Gewebe, , http://www.biomembranes.org, , , , 2014-04-01, 2018-09-30, , 2018-09-30, Third Party Funds Group - Overall project, True>, <Project: Dispersion Effects on Reactivity and Chemo-, Regio- and Stereoselectivity in Organocatalysed Domino Reactions: A Joint Experimental and Theoretical Study, , http://www.uni-giessen.de/fbz/fb08/dispersion, SPP 1807: Control of London dispersion interactions in molecular chemistry, <div id="projekttext">
This joint experimental and theoretical project aims at the development of facile and environmentally friendly organocatalytic multi-step domino reactions exploiting dispersion interactions in these novel systems. We plan to conduct a series of multi-component domino reactions involving readily available nitroolefins and aldehydes, as well as CH-acidic malononitrile already known for its broad application and its versatile use as an exceptionally reactive compound. We will mainly focus on the following three unprecedented reactions: (i) three-component two-step domino Knoevenagel/vinylogous Michael reaction; (ii) three-component five-step branched domino Knoevenagel/nitro-Michael/nitroalkane-Michael/intramolecular condensation/isomerization; (iii) two-component six-step domino Knoevenagel/dimerisation/ intermolecular condensation/intramolecular aza-Michael/intramolecular condensation/ isomerization reaction. Detailed mechanistic investigations will be performed using conventional density-functional methods in conjunction with semiempirical van der Waals corrections as well as novel highly accurate density-functional methods to shed light on the intriguing differences in chemoselectivity, regioselectivity and stereoselectivity in these organocatalysed domino transformations, and, in particular, to understand and exploit the influence of dispersion interaction in these transformations. Taking the envisioned domino reactions as test cases, computational setups for a density-functional based description of organocatalysis will be developed.</div>, , 2015-01-01, , , 2018-01-01, Third Party Funds Group - Sub project, True>, <Project: Plastic deformation, crack nucleation and fracture in lightweight intermetallic composite materials (EXC315 EAM (A3-7)), EXC315 EAM (A3-7), https://www.eam.fau.eu/, Exzellenz-Cluster Engineering of Advanced Materials, , , 2007-11-01, 2017-10-31, , 2017-10-31, Third Party Funds Group - Sub project, True>, <Project: Quantenchemische Untersuchungen zu Bildung, Struktur, Energie und elektronischen Eigenschaften von Carbinen, Fullerenen und Graphenen (C02) (SFB 953), SFB 953, https://www.chemistry.nat.fau.eu/research/dfg/sfb953/, SFB 953: Synthetic Carbon Allotropes (SFB 953), <p>
Im Projekt sollen Kohlenstoffmaterialien, Fullerene, Polyine, Graphene wie auch bisher noch nicht synthetisierte Kohlenstoffallotrope wie beispielsweise Graphyne mit nichtempirischen elektronischen Strukturmethoden insbesondere etablierten wie neu zu entwickelnden Dichtefunktionalmethoden untersucht werden. Mit dem Ziel neue Kohlenstoffverbindungen und -materialien herzustellen sollen deren Bildung, Struktur und Energetik wie auch ihre spektroskopischen und elektronischen Eigenschaften analysiert und vorhergesagt werden.</p>, , 2012-01-01, , , 2015-01-01, Third Party Funds Group - Sub project, True>, '...(remaining elements truncated)...']>
publications: <QuerySet [<Publication: Experimental Determination of Atomic Scale Structure and Energy-Level Alignment of C60 on CsPbBr3(001)>, <Publication: Estimating Group Means Under Local Differential Privacy>, <Publication: A high proportion of germline variants in pediatric chronic myeloid leukemia>, <Publication: Margination of artificially stiffened red blood cells>, <Publication: A Drifting and Blowing Snow Scheme in the Weather Research and Forecasting Model>, <Publication: Deleterious ZNRF3 germline variants cause neurodevelopmental disorders with mirror brain phenotypes via domain-specific effects on Wnt/β-catenin signaling>, <Publication: The missing link: ARID1B non-truncating variants causing Coffin-Siris syndrome due to protein aggregation>, <Publication: IDH3γ functions as a redox switch regulating mitochondrial energy metabolism and contractility in the heart>, <Publication: High-Throughput Numerical Investigation of Process Parameter-Melt Pool Relationships in Electron Beam Powder Bed Fusion>, <Publication: Immunoglobulin G-dependent inhibition of inflammatory bone remodeling requires pattern recognition receptor Dectin-1>, <Publication: Structure and Reactivity of the Ionic Liquid [C(1)C(1)Im][Tf2N] on Cu(111)>, <Publication: Mechanistic Insight into Solution-Based Atomic Layer Deposition of CuSCN Provided by In Situ and Ex Situ Methods>, <Publication: Isolated Rh atoms in dehydrogenation catalysis>, <Publication: A comparative assessment of different adaptive spatial refinement strategies in phase-field fracture models for brittle fracture>, <Publication: The genetic landscape and clinical implication of pediatric Moyamoya angiopathy in an international cohort>, <Publication: Lipid Bicelles in the Study of Biomembrane Characteristics>, <Publication: Wilkinson-type catalysts in ionic liquids for hydrogenation of small alkenes: understanding and improving catalyst stability>, <Publication: Challenges for the implementation of next generation sequencing-based expanded carrier screening: Lessons learned from the ciliopathies>, <Publication: Atomic Layer Deposition of HfS2 on Oxide Interfaces: A Model Study on the Initial Nucleation Processes>, <Publication: Assessing clinical utility of preconception expanded carrier screening regarding residual risk for neurodevelopmental disorders>, '...(remaining elements truncated)...']>
fobes: <QuerySet [<ResearchArea: Research Area:
Title: A3 Multiscale Modeling and Simulation | A3 Multiscale Modeling and Simulation,
Description: <div><p><b>New methods for multiscale and multiphysical modeling for the optimization of structures, properties, and processes</b>
</p>
<p><b>The research concept connects quantum-mechanical approaches on the
molecular scale to discrete approaches for particle systems and to
methods of continuum mechanics</b>
</p>
<p>The cross-sectional Research Area A3 is concerned with modeling,
simulating and optimizing macroscopic material and structural properties
based on their constituent components such as particles, molecules and
atoms. A guiding principle of A3 is that simulation is used as a new
paradigm in gaining qualitative knowledge and quantitative data
alongside theoretical and experimental facts. </p><ul><li>On
the qualitative side, molecules that have not yet been synthesized can
e.g. be anticipated via modeling and simulation. Similarly, new
materials and in particular meta-materials (or utopia-materials) can be
designed optimally, given their desired functionality. </li><li>On the
quantitative side, data-driven model-based simulation and optimization
in the context of the application areas can be used directly in the
process chain.</li></ul><p>Understanding matter and designing materials,
interfaces, and processes from their nano-structural constitution
necessitates both algorithms that scale almost linearly in order to cope
with the vast number of variables, and hierarchical, multi-scale
modeling, analysis and mathematical optimization in order to bridge the
gap between the scales in space, time, and constitutive models. <br /><br />The
Center for Multiscale Modeling and Simulation (CMMS) works on
multiscale approaches and methods for structure, property, and process
optimization. The research concept connects quantum mechanical
approaches on the molecular scale to discrete approaches for particle
systems and to methods of continuum mechanics. </p></div> | <div><p><b>New methods for multiscale and multiphysical modeling for the optimization of structures, properties, and processes</b>
</p>
<p><b>The research concept connects quantum-mechanical approaches on the
molecular scale to discrete approaches for particle systems and to
methods of continuum mechanics</b>
</p>
<p>The cross-sectional Research Area A3 is concerned with modeling,
simulating and optimizing macroscopic material and structural properties
based on their constituent components such as particles, molecules and
atoms. A guiding principle of A3 is that simulation is used as a new
paradigm in gaining qualitative knowledge and quantitative data
alongside theoretical and experimental facts. </p><ul><li>On
the qualitative side, molecules that have not yet been synthesized can
e.g. be anticipated via modeling and simulation. Similarly, new
materials and in particular meta-materials (or utopia-materials) can be
designed optimally, given their desired functionality. </li><li>On the
quantitative side, data-driven model-based simulation and optimization
in the context of the application areas can be used directly in the
process chain.</li></ul><p>Understanding matter and designing materials,
interfaces, and processes from their nano-structural constitution
necessitates both algorithms that scale almost linearly in order to cope
with the vast number of variables, and hierarchical, multi-scale
modeling, analysis and mathematical optimization in order to bridge the
gap between the scales in space, time, and constitutive models. <br /><br />The
Center for Multiscale Modeling and Simulation (CMMS) works on
multiscale approaches and methods for structure, property, and process
optimization. The research concept connects quantum mechanical
approaches on the molecular scale to discrete approaches for particle
systems and to methods of continuum mechanics. </p></div>,
Classification: Field of Research | Forschungsbereich
>, <ResearchArea: Research Area:
Title: Hardware-efficient building blocks for sparse linear algebra and stencil solvers | Hardware-effiziente Bausteine für dünn besetzte lineare Algebra und Stencil-Löser,
Description: <p>The solution of large, sparsely populated systems of equations and eigenvalue problems is typically done by iterative methods. This research area deals with the efficient implementation, optimization and parallelization of the most important basic building blocks of such iterative solvers. The focus is on the multiplication of a large sparse matrix with one or more vector(s) (SpMV). Both matrix-free representations for regular matrices, such as those occurring in the discretization of partial differential equations ("stencils"), and the generic case of a general SpMV with a stored matrix are considered. Our work on the development and implementation of optimized building blocks for SpMV-based solvers includes hardware-efficient algorithms, data access optimizations (spatial and temporal blocking), and efficient and portable data structures. Our structured performance engineering process is employed in this context.</p> | <p>Die Lösung großer dünn besetzter Gleichungssysteme und Eigenwertprobleme geschieht typischerweise mit Hilfe iterativer Verfahren. Der vorliegende Forschungsbereich beschäftigt sich mit der effizienten Implementierung, Optimierung und Parallelisierung der wichtigsten Grundbausteine solcher iterativen Löser. Im Mittelpunkt steht dabei die Multiplikation einer großen dünn besetzten Matrix mit einem oder mehreren Vektor(en) (SpMV). Betrachtet werden sowohl matrixfreie Darstellungen für reguläre Matrizen wie sie etwa bei Diskretisierungen von partiellen Differenzialgleichungen ("Stencils") auftreten als auch der generische Fall einer allgemeinen SpMV mit abgespeicherter Matrix. Unsere Arbeiten zur Entwicklung und Implementierung optimierter Bausteine für SpMV-basierte Löser umfassen unter anderem hardwareeffiziente Algorithmen, Datenzugriffsoptimierungen (räumliches und zeitliches Blocking) sowie effiziente und portable Datenstrukturen. Dabei kommt unser strukturierter Performance Engineering Prozess zum Einsatz.</p>,
Classification: Field of Research | Forschungsbereich
>, <ResearchArea: Research Area:
Title: Performance Engineering | Performance Engineering,
Description: <p>Performance Engineering (PE) is a structured, model-based process for the structured optimization and parallelization of basic operations, algorithms and application codes for modern compute architectures. The process is divided into analysis, modeling and optimization phases, which are iterated for each homogeneous code section until an optimal or satisfactory performance is achieved. During the analysis, the first step is to develop a hypothesis about which aspect of the architecture (bottleneck) limits the execution speed of the software. The qualitative identification of typical bottlenecks can be done with so-called application-independent performance patterns. A concrete performance pattern is described by a set of observable runtime characteristics. Using suitable performance models, the interaction of the application with the given hardware architecture is then described analytically and quantitatively. </p><p>The model thus indicates the maximum expected performance and potential runtime improvements through appropriate modifications. If the model predictions cannot be validated by measurements, the underlying model assumptions are revisited and refined or adjusted if necessary. Based on the model, optimizations can be planned and their performance gain be assessed a-priori. The PE approach is not limited to standard microprocessor architectures and can also be used for projections to future computer architectures. The main focus of the group is on the computational node, where analytic performance models such as the Roofline model or the Execution Cache Memory (ECM) model are used.</p> | <p>Performance Engineering (PE) ist ein strukturierter, modellbasierter Prozess zur zielgerichteten Optimierung und Parallelisierung von Basisoperationen, Algorithmen und Anwenderprogrammen für moderne Hardwarearchitekturen. Der Prozess gliedert sich in Analyse-, Modellierungs- und Optimierungsphasen welche iterativ für jeden homogenen Codeabschnitt durchlaufen werden bis eine optimale bzw. zufriedenstellende Performance erreicht wird. Während der Analyse wird zunächst eine Hypothese erarbeitet welcher Aspekt der Architektur (Flaschenhals) die Ausführungsgeschwindigkeit der Software beschränkt. Die qualitative Identifikation typischer Flaschenhälse kann mit sogenannten anwendungsunabhängigen Performancemustern geschehen. Ein konkretes Performancemuster wird dabei durch spezielle Laufzeitcharakteristika beschrieben. Mit Hilfe geeigneter Performancemodelle wird anschließend die Wechselwirkung von Anwendung mit der gegebenen Hardwarearchitektur analytisch und quantitativ beschrieben. </p><p>Damit gibt das Modell die maximal zu erwartende Performance und mögliche Laufzeitverbesserungen durch entsprechende Modifikationen an. Können die Modellvorhersagen nicht durch Messungen validiert werden, so werden die zugrunde liegenden Modellannahmen überprüft und gegebenenfalls verfeinert oder angepasst. Auf Basis des Modells können Optimierungen geplant und deren Leistungsgewinn a-priori abgeschätzt werden. Der PE-Ansatz ist nicht auf klassische Mikroprozessorarchitekturen beschränkt und kann darüber hinaus auch für Projektionen für zukünftige Rechnerarchitekturen verwendet werden. Die Arbeiten konzentrieren sich typischerweise auf den Rechenknoten, wo analytische Performancemodelle wie das Roofline-Model oder das in der Arbeitsgruppe entwickelte Execution-Cache-Memory (ECM) Modell zum Einsatz kommen.</p><p></p><p></p>,
Classification: Field of Research | Forschungsbereich
>, <ResearchArea: Research Area:
Title: Performance Models | Performance Modellierung,
Description: <p>Performance models describe the interaction between an application and the hardware, forming the basis for a deep understanding of the runtime behavior of an application. The group pursues an analytic approach, the essential components of which are application models and machine models. These components are initially created independently, but their combination and interaction finally provide insights about the bottlenecks and the expected performance. Especially the creation of accurate machine models requires a profound microarchitecture analysis. </p><p>The execution cache memory (ECM) developed by the group allows predictions of single-core performance as well as scaling within a multi-core processor or compute node. In combination with analytic models of electrical power consumption, it can also be used to derive estimates for the energy consumption of an application. The ECM model is a generalization of the well-known Roofline model. </p><p>Beyond the node level, the group investigates the performance of highly parallel MPI and hybrid applications, especially those without frequent synchronizing operations. Applications show highly dynamic behavior due to their interaction with the system's hardware bottlenecks, such as memory and network bandwidth. As a consequence, a simple additive combination of runtime models for the different phases of an application is often inaccurate. We extend existing node-level and communication models to describe effects like desynchronization, resynchronization, and idle wave propagation.</p> | Performancemodelle beschreiben die Interaktion zwischen einer Anwendung und der Hardware und bilden die Grundlage für ein tiefgreifendes Verständnis des Laufzeitverhaltens einer Anwendung. Die Gruppe verfolgt einen analytischen Ansatz, dessen wesentliche Komponenten Anwendungsmodelle und Maschinenmodelle sind. Diese Komponenten werden zunächst unabhängig voneinander erstellt, aber ihre Kombination und Interaktion liefern schließlich Erkenntnisse über die Engpässe und die zu erwartende Leistung. Insbesondere die Erstellung genauer Maschinenmodelle erfordert eine gründliche Analyse der Mikroarchitektur. <br /><br />Das von der Gruppe entwickelte Execution-Cache-Memory-Modell (ECM) ermöglicht Vorhersagen zur Single-Core-Leistung sowie zur Skalierung innerhalb eines Multi-Core-Prozessors oder Rechenknotens. In Kombination mit analytischen Modellen der Leistungsaufnahme kann es auch für Schätzungen des Energieverbrauchs einer Anwendung verwendet werden. Das ECM-Modell ist eine Verallgemeinerung des bekannten Roofline-Modells. <br /><br />Über die Knotenebene hinaus untersucht die Gruppe die Performance hochparalleler MPI- und Hybridanwendungen, insbesondere solcher ohne häufige Synchronisationsvorgänge. Anwendungen zeigen aufgrund ihrer Interaktion mit den Hardware-Flaschanhälsen des Systems (wie Speicher- und Netzwerkbandbreite) ein hochdynamisches Verhalten. Infolgedessen ist eine einfache additive Kombination von Laufzeitmodellen für die verschiedenen Phasen einer Anwendung oft ungenau. Wir erweitern bestehende Knoten- und Kommunikationsmodelle, um Effekte wie Desynchronisation, Resynchronisation und die Ausbreitung von Verzögerungswellen zu beschreiben.<br /><br /><br /><br />,
Classification: Field of Research | Forschungsbereich
>, <ResearchArea: Research Area:
Title: Performance Tools | Performance Tools,
Description: The group develops open-source software in the areas of performance tools, cluster monitoring, and benchmarking.<br />In the area of “performance tools,” the well-known LIKWID tool collection (https://github.com/RRZE-HPC/likwid) is being developed. It contains various tools for the controlled execution of applications on modern computing nodes with complex topology and adaptive runtime parameters. By measuring appropriate hardware metrics, LIKWID enables a detailed analysis of the hardware usage of application programs and is therefore of central importance for the validation of performance models and the identification of performance patterns. The output of derived metrics, such as the main memory bandwidth used, requires continuous adaptation and validation of this tool to new computer architectures.<br />The static code analysis tool OSACA (Open Source Architecture Code Analyzer) can analyze assembler code and provides a runtime prediction within the computing core (https://github.com/RRZE-HPC/OSACA).<br />With ClusterCockpit (https://clustercockpit.org/), the group is developing a comprehensive HPC cluster monitoring solution. ClusterCockpit comprises the following components: cc-metric-collector (node agent on the compute nodes), cc-backend (REST API and web server backend including web-based user interface), cc-metric-store (in-memory metric database), cc-energy-manager (job-specific control of power capping settings, global power capping for a cluster), and cc-node-controller (setting system parameters at the node level). ClusterCockpit offers both job-centric and node-centric views and is accessible to regular HPC users, support staff, and administrators. ClusterCockpit is in productive use at a large number of HPC centers.<br />Benchmark applications are an important tool for understanding performance-limiting factors and exploring new optimization opportunities. They are used to characterize hardware platforms and in research and teaching. The group is developing “The Bandwidth Benchmark” (https://github.com/RRZE-HPC/TheBandwidthBenchmark), an application for measuring the maximum achievable bandwidth on all levels of the memory hierarchy. MD-Bench (https://github.com/RRZE-HPC/MD-Bench) implements state-of-the-art algorithms in the field of molecular dynamics for CPUs and GPUs, including scalable MPI parallelization. SparseBench implements solvers for sparse systems of equations. Different memory formats are supported. SparseBench is also MPI-parallel. MachineState (https://github.com/RRZE-HPC/MachineState) collects and stores all performance-related information at the node level, thus making an important contribution to reproducible benchmark results. | Die Gruppe entwickelt Open-Source-Software in den Themenfeldern Performance-Tools, Cluster-Monitoring und Benchmarking.<br />Im Bereich „Performance Tools” wird die bekannte Werkzeugsammlung LIKWID (https://github.com/RRZE-HPC/likwid) entwickelt. Sie enthält verschiedene Werkzeuge zur kontrollierten Ausführung von Applikationen auf modernen Rechenknoten mit komplexer Topologie und adaptiven Laufzeitparametern. Durch die Messung geeigneter Hardwaremetriken ermöglicht LIKWID eine detaillierte Analyse der Hardwarenutzung von Anwendungsprogrammen und ist somit von zentraler Bedeutung für die Validierung von Leistungsmodellen und die Identifizierung von Leistungsmustern. Die Ausgabe abgeleiteter Metriken, wie der genutzten Hauptspeicherbandbreite, erfordert eine kontinuierliche Anpassung und Validierung dieses Werkzeugs an neue Rechnerarchitekturen.<br />Das statische Code-Analysewerkzeug OSACA (Open Source Architecture Code Analyzer) kann Assemblercode analysieren und liefert eine Laufzeitvorhersage innerhalb des Rechenkerns (https://github.com/RRZE-HPC/OSACA).<br />Mit ClusterCockpit (https://clustercockpit.org/) entwickelt die Gruppe eine umfassende HPC-Cluster-Monitoring-Lösung. ClusterCockpit umfasst die folgenden Komponenten: cc-metric-collector (Knotenagent auf den Compute-Knoten), cc-backend (REST-API und Webserver-Backend inklusive webbasierter Benutzeroberfläche), cc-metric-store (In-Memory-Metric-Datenbank) und cc-energy-manager (jobspezifische Kontrolle von Powercapping-Einstellungen, globales Powercapping für einen Cluster) sowie cc-node-controller (Setzen von Systemparametern auf Knotenebene). ClusterCockpit bietet sowohl jobzentrische als auch knotenzentrische Ansichten und ist für normale HPC-Nutzer, Supportpersonal und Administratoren zugänglich. ClusterCockpit ist an einer Vielzahl von HPC-Zentren im produktiven Einsatz. Benchmark-Applikationen sind ein wichtiges Werkzeug, um leistungsbegrenzende Faktoren zu verstehen und neue Optimierungsmöglichkeiten zu erforschen.<br />Sie werden zur Charakterisierung von Hardwareplattformen sowie in Forschung und Lehre genutzt.<br />Die Gruppe entwickelt mit „The Bandwidth Benchmark” (https://github.com/RRZE-HPC/TheBandwidthBenchmark) eine Anwendung zur Messung der maximal erreichbaren der Bandbreite in allen Ebenen der Speicherhierarchie. MD-Bench (https://github.com/RRZE-HPC/MD-Bench) implementiert State-of-the-Art-Algorithmen im Bereich der Molekulardynamik für CPUs und GPUs, inklusive skalierbarer MPI-Parallelisierung. SparseBench implementiert Löser für dünnbesetzte Gleichungssysteme. Dabei werden unterschiedliche Speicherformate unterstützt. SparseBench ist ebenfalls MPI-parallel. MachineState (https://github.com/RRZE-HPC/MachineState) sammelt und speichert alle leistungsrelevanten Informationen auf Knotenebene und leistet somit einen wichtigen Beitrag zu reproduzierbaren Benchmark-Ergebnissen.,
Classification: Field of Research | Forschungsbereich
>]>
orgas: <QuerySet [<Organisation: Regionales Rechenzentrum Erlangen (RRZE), Regionales Rechenzentrum Erlangen (RRZE), Erlangen, 91058, Martensstraße, 2999-12-31, Zentrale Einrichtungen, True>, <Organisation: Professur für Höchstleistungsrechnen, The research activities of the HPC professorship are located at the interface between numerical applications and modern parallel, heterogeneous high-performance computers.<br /><br />The application focus is on the development and implementation of hardware- and energy-efficient numerical methods and application programs. The foundation of all activities is a structured performance engineering (PE) process based on analytic performance models. Such models describe the interaction between software and hardware and are thus able to systematically identify efficient implementation, optimization and parallelization strategies. The PE process is applied to stencil-based schemes as well as basic operations and eigenvalue solvers for large sparse problems.<br /><br />In the computer science-oriented research focus, performance models, PE methods and easy-to-use open source tools are developed that support the process of performance engineering and performance modeling on the compute node level. We focus on the continuous development of the ECM performance model and the LIKWID tool collection.<br /><br />In teaching and training, the working group consistently relies on a model-based approach to teach optimization and parallelization techniques. The courses are integrated into the computer science and computational engineering curriculum at FAU. Furthermore, the group offers an internationally successful tutorial program on performance engineering and hybrid programming.<br /><br />Prof. Wellein is director of the Erlangen National Center for High-Performance Computing (NHR@FAU) and is the spokesman of the Competence Network for Scientific High Performance Computing in Bavaria (KONWIHR)., Erlangen, 91058, Martensstraße, 2999-12-31, Department Informatik, True>]>