Skip to content

Hardware Overview

The Future Technologies Partition is currently split into two different cluster installations with different hardware components, different cluster management software stacks and different login nodes.

FTP is a distributed memory parallel computer consisting of multiple individual servers called "nodes". Nodes are divided into two clusters, one supports ARM instruction set and the other supports x86. Each node has either two Intel Xeon processors or one ARM processor, at least 32 GB of local memory, local NVMe SSD disks and two high-performance network adapters. All nodes are connected by an extremely fast, low-latency InfiniBand 4X HDR interconnect. In addition two large parallel file systems are connected to FTP.

The operating system installed on every node is Red Hat Enterprise Linux (RHEL) 8.x. On top of this operating system, a set of (open source) software components like Slurm has been installed. Some of these components are of special interest to end users and are briefly discussed here. Others are mostly just of importance to system administrators and are thus not covered by this documentation.

The different server systems in FTP have different roles and offer different services.

Login Nodes

The login nodes are the only nodes directly accessible to end users. These nodes can be used for interactive logins, file management, software development and interactive pre- and postprocessing. Two nodes are dedicated as login nodes.

Compute Nodes

The majority of the nodes (13 out of 15) are dedicated to computations. These nodes are not directly accessible to users. Instead the calculations have to be submitted to a so-called batch system. The batch system manages all compute nodes and executes the queued jobs depending on their priority and as soon as the required resources become available. A single job may use hundreds of compute nodes and many thousand CPU cores at once.

Administrative Service Nodes

Some nodes provide additional services like resource management, external network connections, monitoring, security etc. These nodes can only be accessed by system administrators.

FTP-A64 ARM cluster

Login node A64FX nodes
No. Nodes 1 8
CPU 2x Intel Xeon Gold 6240 1x Fujitsu A64FX
Total Cores 28 48
CPU Clock 2.6 GHz 1.8 GHz
Instruction Set x86_64 ARMv8.2-A + SVE
Memory 384 GB 32 GB
Disk 5.2 TB 372 GB
Disk type NVMe NVMe

FTP-X86 cluster

Login node Intel Cascade Lake + NVIDIA V100 AMD EPYC Milan + MI100 AMD EPYC Milan + Graphcore
No. Nodes 1 3 2 1
CPU 2x Intel Xeon Gold 6230 2x Intel Xeon Gold 6230 2x AMD EPYC 7543 2x AMD EPYC 7543
Sockets 2 2 2 2
Total Cores 40 40 64 64
Total Threads 40 40 128 128
CPU Base Clock 2.1 GHz 2.1 GHz 2.8 GHz 2.8 GHz
Instruction Set x86_64 x86_64 x86_64 x86_64
Memory 192 GB 192 GB 512 GB 512 GB
Accelerator type NVIDIA V100 NVIDIA V100 AMD MI100 Graphcore IPU-M2000
No. Accelerators 1 1 4 4 (=16 IPUs)
Accelerator TFLOPS (FP64) 7 7 11,5 -
Accelerator TFLOPS (FP32) 14 14 23 -
Accelerator TFLOPS (FP16) 28,2 28,2 184,6 250
Accelerator Mem b/w (GB/s) 900 900 1228
Disk 890GB 890 GB 3 TB 3 TB
Disk type NVMe NVMe NVMe NVMe

Interconnect

An important component of FTP is the InfiniBand 4x HDR 200 GBit/s interconnect. All nodes are attached to this high-throughput, very low-latency (~ 1 microsecond) network. InfiniBand is ideal for communication intensive applications and applications that e.g. perform a lot of collective MPI communications.


Last update: December 15, 2021