Slurm: MPI Parallel Program¶
MPI parallel programs run faster than serial programs on multi CPU and multi core systems. N-fold spawned processes of the MPI program, i.e., MPI tasks, run simultaneously and communicate via the Message Passing Interface (MPI) paradigm. MPI tasks do not share memory but can be spawned over different nodes.
Multiple MPI tasks must be launched via
mpirun, e.g. 4 MPI tasks of ''my_par_program'':
$ mpirun -n 4 my_par_program
$ unset I_MPI_HYDRA_BOOTSTRAP).
Running MPI parallel programs in a batch job the interactive environment - particularly the loaded modules - will also be set in the batch job. If you want to set a defined module environment in your batch job you have to purge all modules before setting the wished modules.
If you want to run jobs on batch nodes, generate a wrapper script
job_ompi.sh for OpenMPI containing the following lines:
#!/bin/bash # Use when a defined module environment related to OpenMPI is wished module load mpi/openmpi/<placeholder_for_version> mpirun --bind-to core --map-by core -report-bindings my_par_program
Attention: Do NOT add mpirun options
-n <number_of_processes> or any other option defining processes or nodes, since Slurm instructs mpirun about number of processes and node hostnames. Use ALWAYS the MPI options
--bind-to core and
--map-by core|socket|node. Please type
mpirun --help for an explanation of the meaning of the different options of mpirun option
Considering 4 OpenMPI tasks on a single node, each requiring 1000 MByte, and running for 1 hour, execute:
$ sbatch -p cpuonly -N 1 -n 4 --mem=1000 --time=01:00:00 job_ompi.sh
The wrapper script
job_impi.sh for Intel MPI contains e.g. the following lines:
#!/bin/bash # Use when a defined module environment related to Intel MPI is wished module load mpi/impi/<placeholder_for_version> mpiexec.hydra -bootstrap slurm my_par_program
Do NOT add mpirun options
-n <number_of_processes> or any other option defining processes or nodes, since Slurm instructs mpirun about number of processes and node hostnames.
Launching and running 100 Intel MPI tasks on 5 nodes, each requiring 10 GByte, and running for 5 hours, execute:
$ sbatch --partition cpuonly -N 5 --ntasks-per-node=20 --mem=10gb -t 300 job_impi.sh
If you want to use 128 or more nodes, you must also set the environment variable as follows:
If you want to use the options perhost, ppn or rr, you must additionally set the environment variable