Skip to content

Batch system

As described in the Hardware Overview chapter, users only have direct access to the two login nodes of the Future Technologies Partition. Access to the compute nodes is only possible through the so-called batch system. The batch system on FTP is Slurm.

Slurm is an open source, fault-tolerant, and highly scalable job scheduling system for large and small Linux clusters. Slurm fullfills has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.

Any kind of calculation on the compute nodes of HoreKa requires the user to define calculations as a sequence of commands or single command together with required run time, number of CPU cores and main memory and submit all, i.e., the batch job, to a resource and workload managing software. Therefore any job submission by the user is to be executed by commands of the Slurm software. Slurm queues and runs user jobs based on fair sharing policies.

FTP-A64 ARM batch system queues

Queue Node type(s) Access policy Minimum resources Default resources Maximum resources
a64fx A64FX Exclusive nodes=1, ntasks=1 time=00:30:00, ntasks=48, mem=28000mb time=24:00:00, nodes=8, ntasks=48, mem=28000mb
nvidia100_2 ARM-A100 Exclusive nodes=1, ntasks=1 time=00:30:00, ntasks=80, mem-per-cpu=6350mb time=24:00:00, nodes=4, ntasks=80, mem=522400mb

FTP-X86 batch system queues

Queue Node type(s) Access policy Minimum resources Default resources Maximum resources
intel-clv100 Cascade Lake + NVIDIA V100 Exclusive nodes=1, ntasks=1 time=00:10:00, ntasks=80, mem-per-cpu=192000mb time=24:00:00, nodes=4, ntasks=80, mem=192000mb
amd-milan-mi100 AMD Milan + MI100 Exclusive nodes=1, ntasks=1 time=00:10:00, ntasks=128, mem=513600mb time=24:00:00, nodes=2, ntasks=128, mem=513600mb
amd-milan-graphcore AMD Milan + Graphcore Shared nodes=1, ntasks=1 time=00:10:00, ntasks=128, mem=513600mb time=24:00:00, nodes=1, ntasks=128, mem=513600mb

Last update: November 30, 2022