Skip to content

Maintenance

Maintenance FTP-a64 29.11. - 30.11.2022

The following changes have been performed during the maintenance:

  • There is a new dedicated login node. From now on you have to use the DNS name ftp-a64-login.scc.kit.edu to log into this cluster.

  • The operating system on the nodes is Rocky Linux 8.6.

  • The module system and other helper scripts are similar to those on HoreKa.

  • In addition to the Fujitsu A64FX nodes there are 4 additional nodes containing 2 NVIDIA A100 accelerators and an Ampere Altra Q80-32 host CPU each.

  • The login node also has an Ampere Altra Q80-32 host CPU, so one can compile and run software there. The software modules are available on the login node.

  • The following SLURM partitions are available: nvidia100_2, a64fx

Maintenance FTP-x86 19.04. - 26.04.2022

The following changes have been performed during the maintenance:

  • All firmware versions on all components have been upgraded

  • The operating system version is now based on Red Hat Enterprise Linux (RHEL) 8.4. We recommend to re-compile all applications after the upgrade.

  • The Mellanox OFED InfiniBand stack has been upgraded.

  • The obsolete Intel compiler version 18.0 has been removed. The officially supported Intel compiler versions are now 19.0, 19.1 and 2021.4.0 (oneAPI).

  • LLVM version 14 was added. Older LLVM modules have been removed.

  • OpenMPI 4.0 and 4.1 have been updated to the latest patchlevel. OpenMPI 3.0 has been removed.

  • Many software modules have been updated and built against the new compiler and MPI versions

  • The system Python version 3.9 was added. If no other Python module is loaded, the command python3.9 defaults to version 3.9.2, the command python3.8 defaults to version 3.8.6, the commands python3 and python default to version 3.6.8 and the command python2 defaults to version 2.7.18.

  • The hpc-workspace tools have been updated to version 1.3.7.

  • The Lmod module system has been upgraded.

  • cmake 3.23 has been added.

  • Slurm has been upgraded to version 21.08.7.

  • HKFS Storage: new controller firmware

  • The Spectrum Scale, Lustre and BeeGFS file system clients were updated

  • The NVIDIA driver will be upgraded to version 510.47.03. Cuda version 11.6 has been added.

  • Enroot has been updated to 3.4.0.

  • Singularity has been updated to 3.8.7.

  • Jupyterhub version has been upgraded to 2.2.2.

Maintenance 28.09. - 30.09.2021

The following extensive changes have been performed:

  • A new Graphcore IPU-POD16 system has been be installed as part of the FTP-X86 cluster.

  • The FTP-X86n2 node (Cascade Lake + 1x V100) has been converted into a login node, removing the login node role from the FTP-X86 head node. From now on you have to use the DNS name ftp-x86-login.scc.kit.edu to log into this cluster. Please note that your SSH client will likely show a warning because the IP address of a known server has changed.

  • The NVIDIA V100 GPUs have been removed from the FTP-X86n[1,2] nodes and put into the FTP-X86n[3,4] nodes, turning these two nodes into 2x GPU nodes.

  • The InfinityFabric bridges necessary for fast Inter-GPU communication have been installed in the FTP-X86n[5,6] nodes.

  • The FTP-A64 cluster has been configured to use the HoreKa file systems for $HOME and Workspaces, just like the FTP-X86 cluster already does. The data previously residing in /home on the FTP-A64 nodes is still available in the path /mnt/oldhomes/, so users can migrate it on their own.

  • The ROCm software stack has been updated to version 4.3.1.

  • The firmware of many components has been updated.

Maintenance 09.09.2021

The FTP-X86n[5,6] nodes are now equipped with significantly more powerful AMD EPYC 7543 "Milan" processors. The new CPUs have 32 instead of 16 cores per socket and can execute a total of 128 threads. In addition, the new microarchitecture ("Milan" generation) achieves up to 20% higher performance per core. The distribution of the four GPUs across the two CPU sockets in the nodes has also been optimized during the maintenance.

The batch system partition amd-rome-mi100 has been renamed to amd-milan-mi100 to reflect the upgrade.


Last update: November 30, 2022