Skip to content

Slurm: Job Exit Codes

A job's exit code (also known as exit status, return code and completion code) is captured by SLURM and saved as part of the job record.

Any non-zero exit code will be assumed to be a job failure and will result in a Job State of FAILED with a reason of "NonZeroExitCode".

The exit code is an 8 bit unsigned number ranging between 0 and 255. While it is possible for a job to return a negative exit code, SLURM will display it as an unsigned value in the 0 - 255 range.

Displaying Exit Codes and Signals

Slurm displays a job's exit code in the output of the scontrol show job and the sview utility.

When a signal was responsible for a job or step's termination, the signal number will be displayed after the exit code, delineated by a colon(:).

Submitting Termination Signal

Here is an example, how to save a sbatch termination signal in a typical HoreKa-submit script.

[...]
exit_code=$?
echo "### Calling YOUR_PROGRAM command ..."
mpirun -np 'NUMBER_OF_CORES' $YOUR_PROGRAM_BIN_DIR/runproc ... (options)  2>&1
[ "$exit_code" -eq 0 ] && echo "all clean..." || \
   echo "Executable ${YOUR_PROGRAM_BIN_DIR}/runproc finished with exit code ${$exit_code}"
[...]
  • Do not use time mpirun! The exit code will be the one submitted by the first (time) program.
  • You do not need an exit $exit_code in the scripts.

Last update: February 25, 2021