We supported the RIFS research group in analyzing and resolving intermittent errors after migrating complex atmospheric chemistry models to a new high-performance computing cluster. In doing so, we bridged the gap between scientific application logic and low-level system debugging.
Client/Company/Industry
Research Institute for Sustainability (RIFS)
Duration
2 weeks
Product
Service
Expertise
Software Development
The goal of the project was to identify and resolve the cause of intermittent errors that occurred after the migration to a new high-performance computing cluster. This was intended to restore the reliable execution of the model simulations on the new infrastructure.
A key challenge was that, after the transition to a new high-performance computing cluster, an incompatible MPI configuration caused runtime issues. The root cause was therefore not in the scientific model itself, but in the interaction between the scientific code, system libraries, and the cluster environment. This required an in-depth analysis of the low-level system configuration.
Programming Languages
Bash
Technologies
MPI
Schematic representation of a high-performance computing cluster running scientific simulation software.
Similar problem?
The project identified and resolved the cause of the errors, allowing the model simulations to run reliably again on the high-performance computing cluster. It demonstrated the value of bridging scientific software development and low-level HPC analysis.
RIM2D is an existing, highly efficient 2D hydraulic simulation model for fluvial, pluvial, and urban flooding. As part of a strategic partnership, we supported the extension of the research code with a web application and a cloud-based GPU simulation environment, enabling its transition into a market-ready product.
We developed an Open-Source S3-based data lake solution for the centralized ingestion, categorization, and searchability of data. The goal was to automate and improve manual data management through an integrated architecture with workflow orchestration, data cataloging, and access control.