Many areas in fundamental sciences are facing a drastic increase of data volumes and hence a corresponding increase in computing requirements for data reconstruction, simulation and data analysis. Traditional infrastructure, such as specialised compute centres dedicated to individual science areas, cannot efficiently handle these new requirements alone. In recent years, a number of successful projects demonstrated that many of these compute tasks can be executed efficiently on HPC systems as well. This offers the opportunity to dynamically and transparently complement the existing computing centres of the large scientific communities with large-scale HPC resources such as the new HoreKa supercomputer at KIT. However, the practical application at scale faces challenges from provisioning of existing scientific software stacks, efficient multi-level scheduling respecting existing global workflow management systems, and the transparent, performant usage of very large remote data repositories. In this proposal, we address the most relevant issues to ensure the stable and sustainable operation of HoreKa for applications typical in particle physics (High Energy Physics, “HEP”) and similar fields such as Hadron and Nuclear physics (“HaN”), and Astroparticle Physics (“ATP”). Interesting further steps at a later stage are the inclusion of workflows relying on GPUs and the implementation of caching methods to enable fast, repeated access to data sets for the final analysis by individual scientists.
The goal of the proposed project is to ensure the provisioning of stable services to the HEP, ATP and HaN communities at HoreKa. Establishing consulting services, schools and training services, in collaboration with other, already existing efforts at KIT (e. g. the annual GridKa Computing School) are another important ingredient of this proposal. As technology advances, new use cases become feasible. In the area of user analyses, machine learning methods (or “artificial intelligence”) has been rapidly adopted in particle physics and results in ever increasing demands for efficient, low-latency analysis environments like Jupyter Notebooks, and the usage of GPU resources for the training of ML algorithms.
For these challenges, which are outlined in more detail in the paragraphs below, we foresee one postdoctoral position, of which one third will be co-funded from a common project of ETP and GridKa (the above-mentioned BMBF project FIDIUM), or from ETP institute resources. The person to be employed will closely collaborate with PhD students in the GridKa/ETP group led by Dr. Manuel Giffels, who are foreseen to work on the further development of COBalD/TARDIS and on strategies to improve the overall performance of heterogeneous compute environments by placement of coordinated, dynamically created data caches. While these efforts are also directly beneficial to the experiments Belle II and CMS, the focus of the work envisioned in this project will be on providing such technologies to other experimental and theoretical groups in particle physics or from other communities, most notably ATP and HaN. This will strengthen the role of KIT, SCC and the Horeka team as a supporting centre for the very special demands for high-performance and data-intensive computing of these physics communities.