User oriented FramewOrk for Satellite data (UFOS)

Project ID: UFOS

Tandem Project Manager Matthias Schneider
NHR@KIT Project Manager Achim Streit
Project Coordinator Ugur Cayoglu
Team SDL Earth System Science
Researcher Kanwal Shahzadi
Open source software -

Introduction

Global satellite observations of the atmospheric composition are essential for climate and weather research. Respective data amounts are tremendous and furthermore the structure of the data archives are generally oriented on an efficient processing chain (e.g., orbit-wise data processing and archiving of individual observations) and not optimized for scientific applications (e.g., data optimized for a specific spatial and temporal grid). This project develops a framework that scans through large satellite data archives and prepares the data in line with the individual scientific needs. This most importantly supports the scientific use of the data and makes a very strong contribution to climate and weather research.
The core of the framework is a data merging software (whose performance will be tested on HoreKa’s CPUs and GPUs), an efficient task management (that assures optimal efficiency through parallel processing), and a Machine Learning supported data flow management (that minimizes the required I/O operations). All the developments will be made sustainable and in an open source manner, so that the framework can easily be adjusted and extended for novel data merging methods and other databases (including databases of other disciplines).

Figure: Flow chart for the proposed user oriented framework for satellite data (UFOS) and its linkage to Helmholtz (ATMO) and national (NFDI4Earth) data infrastructures. The different components of UFOS are marked by blue colour.

Project description

Satellite-based observations are unique in providing global data and using sophisticated instrumentation for a detailed observation of the atmospheric trace gas composition. In the past, this was limited to dedicated scientific missions; however, recently also operational meteorological satellites are becoming equipped with highly sophisticated instrumentation. Given the guaranteed long-term support of operational meteorological missions, this development allows for high quality long-term monitoring of the atmospheric composition and thus unprecedented climate research studies. A very prominent example is the instrument IASI (Infrared Atmospheric Sounding Interferometer) aboard the EUMETSAT (European Organisation for the Exploitation of Meteorological Satellites) satellites Metop-A, -B, and –C, and with operations guaranteed until the 2040s on three additional satellites of a successor mission already approved by EUMETSAT (IASI-NG/Metop-SG). In the framework of the European Research Council project MUSICA (MUlti-platform remote Sensing of Isotopologues for investigating the Cycle of Atmospheric water, 2011-2016) and several MUSICA successor projects, we developed the MUSICA IASI processing chain (Schneider et al., 2016; Schneider et al., 2021a). It uses the IASI spectra for determining vertical profiles and isotopologue composition of atmospheric water, the greenhouse gases nitrous oxide and methane, and nitric acid (an important component for ozone chemistry). Within the series of the projects GLOMIR (GLObal MUSICA IASI Retrievals, 2018-2022) on ForHLR I/II and HoreKa, we have started the processing of IASI data for longer periods (current processing status is 2014 – present).

The MUSICA IASI processing chain generates single NetCDF data files per orbit with all data products according to CF (Climate and Forecast) metadata standards and fully accomplish the FAIR principles (Wilkinson et al., 2016). A database with an orbit-wise storage of the individual products is typical for satellite data, because it is very practical for the efficient processing and storage; however, it is often inefficient for scientific use. Most users are only interested in a single data product, in a limited spatial area, but with a specific horizontal and vertical gridding that is different from the data in the archive. Nevertheless, all of them have to download all orbits containing all data products (incl. all storage intensive auxiliary variables) for the whole globe and then pick out the data of interest and perform data merging calculations using their own software. For instance, a user interested in one year of MUSICA IASI data at a 1°x1° horizontal resolution must download all orbits (15383 files) with a total of 10.8 TB and then apply their merging software to generate the desired 1°x1° data (whose data volume will be less than 1‰ of the downloaded data). This causes a lot of unnecessary data traffic (Table 1 gives examples for the data volumes of the MUSICA IASI database). Furthermore, an optimal data merging should consider the complementarity of the individual data; however, respective calculations involve the inversion of large matrices (e.g., Kalman 1960; Schneider et al., 2021b; Zoppetti et al., 2021) and are thus computationally expensive. If data users realize the merging in a suboptimal manner (simple averaging due to the lack of expertise and/or computing resources) a large part of the information actually provided by the satellite data will be wasted. To ensure an optimal data use and avoid unnecessary data traffic, we urgently need a framework that prepares the data in an optimal manner (for a specific user need) prior to the data download. Given the large number of individual data products, the large data volume, and the computationally demanding data merging calculations, the use of HPC for realizing this framework is mandatory.

References

Kalman, R. E.: A new approach to linear filtering and prediction problems, J. Basic Eng., 82, 35, 1960.

Schneider, M., Wiegele, A., Barthlott, S., González, Y., Christner, E., Dyroff, C., García, O. E., Hase, F., Blumenstock, T., Sepúlveda, E., Mengistu Tsidu, G., Takele Kenea, S., Rodríguez, S., and Andrey, J.: Accomplishments of the MUSICA project to provide accurate, long-term, global and high-resolution observations of tropospheric {H2O,δD} pairs – a review, Atmos. Meas. Tech., 9, 2845-2875, doi:10.5194/amt-9-2845-2016, 2016.

Schneider, M., Ertl, B., Diekmann, C. J., Khosrawi, F., Weber, A., Hase, F., Höpfner, M., García, O. E., Sepúlveda, E., and Kinnison, D.: Design and description of the MUSICA IASI full retrieval product, Earth Syst. Sci. Data Discuss. [preprint], https://doi.org/10.5194/essd-2021-75, in review, 2021a.

Schneider, M., Ertl, B., Diekmann, C. J., Khosrawi, F., Röhling, A. N., Hase, F., Dubravica, D., García, O. E., Sepúlveda, E., Borsdorff, T., Landgraf, J., Lorente, A., Chen, H., Kivi, R., Laemmel, T., Ramonet, M., Crevoisier, C., Pernin, J., Steinbacher, M., Meinhardt, F., Deutscher, N. M., Griffith, D. W. T., Velazco, V. A., and Pollard, D. F.: Synergetic use of IASI and TROPOMI space borne sensors for generating a tropospheric methane profile product, Atmos. Meas. Tech. Discuss. [preprint], https://doi.org/10.5194/amt-2021-31, in review, 2021b.

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., Gonzalez-Beltran, A., Gray, A. J., Groth, P., Goble, C., Grethe, J. S., Heringa, J., ’t Hoen, P. A., Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S. J., Martone, M. E., Mons, A., Packer, A. L., Persson, B., Rocca-Serra, P., Roos, M., van Schaik, R., Sansone, S.-A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M. A., Thompson, M., van der Lei, J., van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K., Zhao, J., and Mons, B.: The FAIR Guiding Principles for scientific data management and stewardship, Sci Data, 3, 1215, 2052–4463, https://doi.org/10.1038/sdata.2016.18, 2016.

Zoppetti, N., Ceccherini, S., Carli, B., Del Bianco, S., Gai, M., Tirelli, C., Barbara, F., Dragani, R., Arola, A., Kujanpää, J., van Peet, J. C. A., van der A, R., and Cortesi, U.: Application of the Complete Data Fusion algorithm to the ozone profiles measured by geostationary and low-Earth-orbit satellites: a feasibility study, Atmos. Meas. Tech., 14, 2041–2053, https://doi.org/10.5194/amt-14-2041-2021, 2021.