Skip to content

Batch system

As described in the Hardware Overview chapter, users only have direct access to the two login nodes of the Future Technologies Partition. Access to the compute nodes is only possible through the so-called batch system. The batch system on FTP is Slurm.

Slurm is an open source, fault-tolerant, and highly scalable job scheduling system for large and small Linux clusters. Slurm fulfills has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.

Any kind of calculation on the compute nodes of HoreKa requires the user to define calculations as a sequence of commands or single command together with required run time, number of CPU cores and main memory and submit all, i.e., the batch job, to a resource and workload managing software. Therefore any job submission by the user is to be executed by commands of the Slurm software. Slurm queues and runs user jobs based on fair sharing policies.

FTP-A64 ARM batch system queues

Queue Node type(s) Access policy Minimum resources Default resources Maximum resources
a64fx A64FX Exclusive nodes=1, ntasks=1 time=00:30:00, ntasks=48, mem=28000mb time=24:00:00, nodes=8, ntasks=48, mem=28000mb
nvidia100_2 ARM-A100 Exclusive nodes=1, ntasks=1 time=00:30:00, ntasks=80, mem-per-cpu=6350mb time=24:00:00, nodes=4, ntasks=80, mem=522400mb
dual_a_max Dual ARM Altra max Exclusive nodes=1, ntasks=1 time=00:30:00, ntasks=256, mem-per-cpu=2035mb time=24:00:00, nodes=6, ntasks=256, mem=520960mb

FTP-X86 batch system queues

Queue Node type(s) Access policy Minimum resources Default resources Maximum resources
intel-clv100 Cascade Lake + NVIDIA V100 Exclusive nodes=1, ntasks=1 time=00:30:00, ntasks=80, mem-per-cpu=192000mb time=24:00:00, nodes=4, ntasks=80, mem=192000mb
amd-milan-mi100 AMD Milan + MI100 Shared nodes=1, ntasks=1 time=24:00:00, ntasks=2, mem=8025mb time=24:00:00, nodes=1, ntasks=64, mem=513600mb
amd-milan-mi100 AMD Milan + MI210 Shared nodes=1, ntasks=1 time=24:00:00, ntasks=2, mem=8025mb time=24:00:00, nodes=1, ntasks=64, mem=513600mb
amd-milan-mi250 AMD Milan + MI250 Shared nodes=1, ntasks=1 time=24:00:00, ntasks=1, mem=8025mb time=24:00:00, nodes=2, ntasks=256, mem=1027200mb
amd-milan-graphcore AMD Milan + Graphcore Shared nodes=1, ntasks=1 time=24:00:00, ntasks=2, mem=8025mb time=24:00:00, nodes=1, ntasks=128, mem=513600mb

Last update: September 28, 2023