Skip to content

Slurm: Chain Jobs

The CPU time requirements of many applications exceed the limits of the job classes. In those situations it is recommended to solve the problem by a job chain. A job chain is a sequence of jobs where each job automatically starts its successor.

#!/bin/bash
##############################################
## simple Slurm submitter script to setup   ##
## a chain of jobs for HoreKa               ##
##############################################
## ver.  : 2018-11-27, KIT, SCC

## Define maximum number of jobs via positional parameter 1, default is 5
max_nojob=${1:-5}

## Define your jobscript (e.g. "~/chain_job.sh")
chain_link_job=${PWD}/chain_job.sh

## Define type of dependency via positional parameter 2, default is 'afterok'
dep_type="${2:-afterok}"
## -> List of all dependencies:
## https://slurm.schedmd.com/sbatch.html

myloop_counter=1
## Submit loop
while [ ${myloop_counter} -le ${max_nojob} ] ; do
   ##
   ## Differ msub_opt depending on chain link number
   if [ ${myloop_counter} -eq 1 ] ; then
      slurm_opt=""
   else
      slurm_opt="-d ${dep_type}:${jobID}"
   fi
   ##
   ## Print current iteration number and sbatch command
   echo "Chain job iteration = ${myloop_counter}"
   echo "   sbatch --export=myloop_counter=${myloop_counter} ${slurm_opt} ${chain_link_job}"
   ## Store job ID for next iteration by storing output of sbatch command with empty lines
   jobID=$(sbatch -p <queue> --export=ALL,myloop_counter=${myloop_counter} ${slurm_opt} ${chain_link_job} 2>&1 | sed 's/[S,a-z]* //g')
   ##
   ## Check if ERROR occured
   if [[ "${jobID}" =~ "ERROR" ]] ; then
      echo "   -> submission failed!" ; exit 1
   else
      echo "   -> job number = ${jobID}"
   fi
   ##
   ## Increase counter
   let myloop_counter+=1
done

Last update: September 28, 2023