Skip to content

Maintenance

Maintenance 15.04. - 19.04.2024

The following changes were performed during maintenance:

  • All firmware versions on all components were upgraded

  • The operating system was upgraded to Red Hat Enterprise Linux (RHEL) 8.8. We recommend to re-compile all applications after the upgrade.

  • The Mellanox OFED InfiniBand stack was upgraded

  • Slurm was upgraded

  • File system clients and servers were updated

  • Compiler and MPI versions and the software modules built against them were updated. Modules of deprecated versions will be removed.

Maintenance 26.10.2023

Due to parametrization works on the infrastructure of the Horeka compute centre and maintenance the Cluster will not be available on the 26.10.2023 from 8:30 to 19:00 o'clock

The following changes will be performed during the maintenance:

  • Slurm will be upgraded to version 23.02.6

    • In Slurm versions 23.02, --ntasks-per-core applies to job and step allocations. If set to 1, it will now imply --cpu-bind=cores. Otherwise, if set to a value greater than 1, it will imply --cpu-bind=threads. For jobs using intel mpi and the slurm option --ntasks-per-core, you will need to export SLURM_CPU_BIND=NONE in the job environment.
    • Changed task_prolog.hk -> task_prolog
  • NVIDIA driver will be upgraded to the most recent version (535.104.12 or higher)

  • The bandwidth to LSDF online storage will be increased

Security Update 10.08.2023

On 10.08.2023 a short interruption of regular operation has taken place to address multiple security vulnerabilities in Intel and AMD microarchitectures. A malicious actor may use these vulnerabilities for unauthorized access to the contents of the vector registers, thus leaking potentially sensitive information.

In order to mitigate the aforementioned vulnerabilities new versions of Intel and AMD microcode were installed and a reboot of the affected nodes was carried out over the following weekend.

As a result of the microcode update a performance drop of 5% to 10% under normal workloads might be observed on Intel Platforms . This is due to the update restricting the execution of the gather instruction provided by the Intel Advanced Vector Extensions 2 (Intel AVX2) and Intel Advanced Vector Extensions 512 (Intel AVX-512). For more information please refer to the technical paper

Maintenance 11.04. - 12.05.2023

The following changes were performed during the maintenance:

  • All firmware versions on all components were upgraded

  • The operating system was upgraded to Red Hat Enterprise Linux (RHEL) 8.6. We recommend to re-compile all applications after the upgrade.

  • The Mellanox OFED InfiniBand stack was upgraded

  • The NVIDIA driver was upgraded

  • pigz and pbzip are not supported anymore. Please use pzstd instead

  • Slurm was upgraded to version 22.05.8

  • File system clients (Spectrum Scale, Lustre and BeeGFS) were updated

  • Spectrum Scale file system servers were updated

  • The file systems home and work have been extended with 1 PB fast NVMe SSD storage. No user action is required to use this new storage. New files will be automatically stored on the SSDs. Old and large files will be transparently migrated from the SSDs to the slower HDDs if the disk space on the SSDs fills up.

  • Singularity was replaced with its successor Apptainer, Enroot was upgraded

  • Compiler and MPI versions and the software modules built against them were be updated. Modules of deprecated versions were removed. Some additional modules will be added later on.

  • After the maintenance the following per-user limits apply (via cgroups) on the login node: 48 GB phyisical memory, 400% CPU cycles (100% equals 1 thread)

Maintenance 12.07. - 16.07.2021

From July 12th 9:00 am until July 16th noon, no compute nodes will be available on HoreKa and HAICORE, so no jobs will run. Additionally, individual login nodes will be unavailable for some time during this interval, which will also affect the Jupyter and CI services.


Last update: April 19, 2024