Skip to content

Slurm: MPI Parallel Program

MPI parallel programs run faster than serial programs on multi CPU and multi core systems. N-fold spawned processes of the MPI program, i.e., MPI tasks, run simultaneously and communicate via the Message Passing Interface (MPI) paradigm. MPI tasks do not share memory but can be spawned over different nodes.

Multiple MPI tasks must be launched via mpirun, e.g. 4 MPI tasks of ''my_par_program'':

$ mpirun -n 4 my_par_program

This command runs 4 MPI tasks of ''my_par_program'' on the node you are logged in. To run this command on HoreKa with a loaded Intel MPI the environment variable I_MPI_HYDRA_BOOTSTRAP must be unset ( --> #!console $ unset I_MPI_HYDRA_BOOTSTRAP).

Running MPI parallel programs in a batch job the interactive environment - particularly the loaded modules - will also be set in the batch job. If you want to set a defined module environment in your batch job you have to purge all modules before setting the wished modules.

OpenMPI

If you want to run jobs on batch nodes, generate a wrapper script job_ompi.sh for OpenMPI containing the following lines:

#!/bin/bash
# Use when a defined module environment related to OpenMPI is wished
module load mpi/openmpi/<placeholder_for_version>
mpirun --bind-to core --map-by core -report-bindings my_par_program

Attention: Do NOT add mpirun options -n <number_of_processes> or any other option defining processes or nodes, since Slurm instructs mpirun about number of processes and node hostnames. Use ALWAYS the MPI options --bind-to core and --map-by core|socket|node. Please type mpirun --help for an explanation of the meaning of the different options of mpirun option --map-by.

Considering 4 OpenMPI tasks on a single node, each requiring 1000 MByte, and running for 1 hour, execute:

$ sbatch -p normal -N 1 -n 4 --mem=1000 --time=01:00:00 job_ompi.sh

Intel MPI

The wrapper script job_impi.sh for Intel MPI contains e.g. the following lines:

#!/bin/bash
# Use when a defined module environment related to Intel MPI is wished
module load mpi/impi/<placeholder_for_version>
mpiexec.hydra -bootstrap slurm my_par_program

Do NOT add mpirun options -n <number_of_processes> or any other option defining processes or nodes, since Slurm instructs mpirun about number of processes and node hostnames.

Launching and running 100 Intel MPI tasks on 5 nodes, each requiring 10 GByte, and running for 5 hours, execute:

$ sbatch --partition normal -N 5 --ntasks-per-node=20 --mem=10gb -t 300 job_impi.sh

If you want to use 128 or more nodes, you must also set the environment variable as follows:

$ export I_MPI_HYDRA_BRANCH_COUNT=-1

If you want to use the options perhost, ppn or rr, you must additionally set the environment variable I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=off.


Last update: September 28, 2023