Maintenance 28.09. - 30.09.2021¶
The following extensive changes have been performed:
A new Graphcore IPU-POD16 system has been be installed as part of the FTP-X86 cluster.
The FTP-X86n2 node (Cascade Lake + 1x V100) has been converted into a login node, removing the login node role from the FTP-X86 head node. From now on you have to use the DNS name
ftp-x86-login.scc.kit.eduto log into this cluster. Please note that your SSH client will likely show a warning because the IP address of a known server has changed.
The NVIDIA V100 GPUs have been removed from the FTP-X86n[1,2] nodes and put into the FTP-X86n[3,4] nodes, turning these two nodes into 2x GPU nodes.
The InfinityFabric bridges necessary for fast Inter-GPU communication have been installed in the FTP-X86n[5,6] nodes.
The FTP-A64 cluster has been configured to use the HoreKa file systems for
$HOMEand Workspaces, just like the FTP-X86 cluster already does. The data previously residing in
/homeon the FTP-A64 nodes is still available in the path
/mnt/oldhomes/, so users can migrate it on their own.
The ROCm software stack has been updated to version 4.3.1.
The firmware of many components has been updated.
The FTP-X86n[5,6] nodes are now equipped with significantly more powerful AMD EPYC 7543 "Milan" processors. The new CPUs have 32 instead of 16 cores per socket and can execute a total of 128 threads. In addition, the new microarchitecture ("Milan" generation) achieves up to 20% higher performance per core. The distribution of the four GPUs across the two CPU sockets in the nodes has also been optimized during the maintenance.
The batch system partition amd-rome-mi100 has been renamed to amd-milan-mi100 to reflect the upgrade.