Containers

Containers on HPC systems¶

To date, only few container runtime environments integrate well with HPC environments due to security concerns and differing assumptions in some areas.

For example native Docker environments require elevated privileges, which is not an option on shared HPC resources. Docker's "rootless mode" is also currently not supported on our HPC systems because it does not support necessary features such as cgroups resource controls, security profiles, overlay networks, furthermore GPU passthrough is difficult. Necessary subuid (newuidmap) and subgid (newgidmap) settings may impose security issues.

On HoreKa Enroot and Apptainer are supported.

Further rootless container runtime environments (Podman, …) might be supported in the future, depending on how support for e.g. network interconnects, security features and HPC file systems develops.

Apptainer¶

Apptainer enables you to run e.g. Docker containers on HPC systems. It is the recommended tool to use containers on HoreKa and integrates well with GPU usage.

apptainer logo

Usage¶

Excellent documentation is provided in the Apptainer User Guide. This documentation here therefore confines itself to simple examples to get to know the essential functionalities.

Using Apptainer usually involves two steps:

Building a container image using apptainer build
Running a container image using apptainer run or an application inside a container using apptainer exec

Building an image¶

apptainer build alpine.sif docker://alpine
This pulls the latest alpine image from Dockerhub and locally creates a container image file called alpine.sif.
apptainer build pytorch-21.04-p3.sif docker://nvcr.io#nvidia/pytorch:21.04-py3
This pulls the latest pytorch image from NVIDIA's NGC registry and locally creates a container image file called pytorch-21.04-p3.sif.

Running an image¶

apptainer shell docker://alpine
Start a shell in an Alpine container. The required image is downloaded automatically and stored in the Apptainer cache directory, typically $HOME/.apptainer/cache/. No creation of a .sif file is required in advance.
apptainer shell ubuntu.sif
Start a shell in the Ubuntu container.
apptainer run alpine.sif
Start the container alpine.sif and run the default runscript (if it exists) provided by the image.
apptainer exec alpine.sif /bin/ls
Start the container alpine.sif and run the /bin/ls command.

Automatic mount of HOME

By default, the HOME directory is mounted.
If environment variables such as HISTSIZE are set in the started container, the .bash_history in the HOME directory outside the container is modified accordingly. Your history might be lost! If you do not want this, avoid the automatic mount of HOME with the --no-home flag. Please read Bind Paths and Mounts.
Example: apptainer shell --no-home docker://ubuntu

Enroot¶

Enroot is available to all users by default.

Usage¶

Excellent documentation is provided on NVIDIA's github page. This documentation here therefore confines itself to simple examples to get to know the essential functionalities.

Using Docker containers with Enroot requires three steps:

Importing an image
Creating a container
Starting a container

Optionally containers can also be exported and transferred.

Importing a container image¶

$ enroot import docker://alpine This pulls the latest alpine image from dockerhub (default registry). You will obtain the file alpine.sqsh.
$ enroot import docker://nvcr.io#nvidia/pytorch:21.04-py3 This pulls the latest pytorch image from NVIDIA's NGC registry. You will obtain the file nvidia+pytorch+21.04-py3.sqsh.
$ enroot import docker://registry.scc.kit.edu#myProject/myImage:latest This pulls your latest Image from the KIT registry. You obtain the file myImage.sqsh.

Creating a container¶

$ enroot create --name nvidia+pytorch+21.04-py3 nvidia+pytorch+21.04-py3.sqsh Create a container named nvidia+pytorch+21.04-py3 by unpacking the .sqsh-file.

Creating a container means that the squashed container image is unpacked inside $ENROOT_DATA_PATH/. By default this variable points to $HOME/.local/share/enroot/. If there are quota restrictions, $ENROOT_DATA_PATH can also be set to point to a workspace.

Starting a container

$ enroot start --rw nvidia+pytorch+21.04-py3 bash Start the container nvidia+pytorch+21.04-py3 in read-write mode (--rw) and run bash inside the container.
$ enroot start --root --rw nvidia+pytorch+21.04-py3 bash Start container in --rw-mode and get root access (--root) inside the container. You can now install software with root privileges, depending on the containerized Linux distribution e.g. with apt-get install …, apk add …, yum install …, pacman -S …
$ enroot start -m <localDir>:/work --rw nvidia+pytorch+21.04-py3 bash Start container and mount (-m) a local directory to /work inside the container.
$ enroot start -m <localDir>:/work --rw nvidia+pytorch+21.04-py3 jupyter lab Start container, mount a directory and start the application jupyter lab.

Exporting and transfering containers

If you intend to use Docker images which you built e.g. on your local desktop, and transfer them somewhere else, there are several possibilities to do so:

Build an image via docker build and a Dockerfile, import this image from the Docker daemon via $ enroot import --output myImage.sqsh dockerd://myImage. Copy the .sqsh-file to HoreKa and import it with enroot import.
Export an existing enroot container via $ enroot export --output myImage.sqsh myImage. Copy the .sqsh-file to HoreKa and import it with enroot import.
Create a self extracting bundle from a container image via $ enroot bundle --output myImage.run myImage.sqsh. Copy the .run-file to HoreKa. You can run the self extracting image via ./myImage.run even if enroot is not installed!

Container management

You can list all containers on the system with the enroot list command. Additional information is revealed by the --fancy parameter.

The containers can be removed with the enroot remove command, or by simply deleting $ENROOT_DATA_PATH/<containerName>.

Slurm integration¶

Enroot allows you to run containerized applications non-interactively, including MPI- and multi-node parallelism. The necessary Slurm integration is realized via the Pyxis plugin. It allows unprivileged cluster users to run containerized tasks through the srun command. ¹ You can either download a container directly during the job start. Or download a container via enroot, prepare it and start a job with the container.

Start directly via Pyxis¶

salloc -p dev_cpuonly -t 00:10:00 --container-image=docker://ubuntu --container-name=ubuntu --container-mounts=/etc/slurm/task_prolog:/etc/slurm/task_prolog,/scratch:/scratch,/usr/lib64/slurm/libslurmfull.so,/usr/lib64/libhwloc.so.15

Existing Container/Container created via enroot¶

create a container via enroot:

enroot import docker://ubuntu
enroot create -n pyxis_ubuntu ubuntu.sqsh 
enroot start -m <localDir>:/work --rw pyxis_ubuntu

start the container via pyxis:

salloc -p dev_cpuonly -t 00:10:00 --container-name=ubuntu --container-mounts=/etc/slurm/task_prolog:/etc/slurm/task_prolog,/scratch:/scratch,/usr/lib64/slurm/libslurmfull.so,/usr/lib64/libhwloc.so.15

All options usable for pyxis can be found via srun --help under "Options provided by plugins:"

Notable Options:

--container-mount-home: Mounts the home directory into the container
--container-writable: Makes the container filesystem writable (otherwise only the mounted home is writebale)
--container-remap-root: Become root in your container. Allows installation of software via e.G apt(ubuntu)

In both cases an enroot container will be created under ~./local/share/enroot/. If you have downloaded the container, the --container-image= is no longer needed, just the name of the container

Important

The container name must start with pyxis_ for the plugin to work. When using the first method this is done automatically. Also, when specifying the container name in your slurm job the pyxis_ must be omitted.

Important

The following mounts are needed for the plugin to work:
--container-mounts=/etc/slurm/task_prolog.hk:/etc/slurm/task_prolog.hk,/scratch:/scratch,/usr/lib64/slurm/libslurmfull.so,/usr/lib64/libhwloc.so.15

For a detailed description of the usage please refer to NVIDIA's github section.

Examples

Run a command inside of a container
```
$ srun -p <partition_name> --container-image=centos grep PRETTY /etc/os-release
PRETTY_NAME="CentOS Linux 8"`
```
Slurm in conjunction with the Pyxis plugin submits a job on partion cpuonly, automatically downloads the Centos image from dockerhub, (temporarily) unpacks the root file system and executes the command grep PRETTY /etc/os-release inside the running container. Non-standard registries can be used via --container-image=<otherRegistry>#<imageName> and even local sqsh-files via --container-image=~/myUbuntu.sqsh.

Reuse containers by naming them

First run:

$ srun -p <partition_name> --container-image=centos --container-name=myCentos grep PRETTY /etc/os-release

By naming the container, the root file system will be unpacked under $ENROOT_DATA_PATH/pyxis_myCentos. For consecutive runs, the following command can be used:

$ srun -p <partition_name> --container-name=myCentos grep PRETTY /etc/os-release

Tensorflow+GPU example:

$ srun -p <partition_name> \
    --gres=gpu:1 \
    --container-mounts=/etc/slurm/task_prolog:/etc/slurm/task_prolog,/scratch:/scratch,/usr/lib64/slurm/libslurmfull.so,/usr/lib64/libhwloc.so.15 \
    --container-name=nvidia+tensorflow+21.07-tf2-py3 \
    bash -c 'python -c "import tensorflow as tf; tf.config.list_physical_devices()"'
…
Found device 0 with properties:
pciBusID: 0000:31:00.0 name: NVIDIA A100-SXM4-40GB computeCapability: 8.0
…

This command spawns a job with one GPU on the accelerated partition. The --container-mounts option mounts some directories inside the container, which are required for a smooth operation with the HoreKa-Slurm configuration. The named container can be prepared either by a first run with --container-image=nvidia+tensorflow+21.07-tf2-py3 or by downloading (and eventually modifying) it manually:

$ enroot import docker://nvcr.io#nvidia/tensorflow:21.07-tf2-py3
$ enroot create --name pyxis_nvidia+tensorflow+21.07-tf2-py3 nvidia+tensorflow+21.07-tf2-py3.sqsh

Sbatch example: Execute a sbatch script inside a container image

$ cat pyxis-sbatch.sh
#!/bin/bash
#SBATCH --container-image nvcr.io\#nvidia/tensorflow:22.02-tf2-py3
#SBATCH --container-mounts=/etc/slurm/task_prolog:/etc/slurm/task_prolog,/scratch:/scratch,/usr/lib64/slurm/libslurmfull.so,/usr/lib64/libhwloc.so.15
#SBATCH -p <partition_name>
#SBATCH --gres=gpu:1

python -c "import tensorflow as tf;\
    print(tf.__version__);\
    print(tf.reduce_sum(tf.random.normal([10000, 10000])))"

$ sbatch pyxis-sbatch.sh -o slurm.out
$ cat slurm.out
pyxis: importing docker image ...
…
2.7.0
tf.Tensor(-5098.84, shape=(), dtype=float32)

Warning

Please note the escape sequence \ in the nvcr.io\#nvidia/tensorflow:22.02-tf2-py3 string. Without, Slurm/Bash would interpret the separator # between the registry and the image name as a comment.

FAQ¶

How can I run JupyterLab in a container and connect to it?

Start an interactive session with or without GPUs. Notice the compute node ID the session is running on, and start a container with a running JupyterLab, e.g.: enroot start -m <localDir>:/work --rw nvidia+pytorch+21.04-py3 jupyter lab --no-browser --ip 0.0.0.0
Open a terminal on your desktop and create a SSH-tunnel to the running JupyterLab instance on the compute node. Insert the node ID, where the interactive session is running on: ssh -L8888:<computeNodeID>:8888 <yourAccount>@hk.scc.kit.edu
Open a web browser and open the URL localhost:8888
Enter the token, which is visible in the output of the first terminal. The output should look like this:
```
Or copy and paste this URL:
        http://hostname:8888/?token=fdaa7bf344b9ef3c0623c6e…4ce56dd845
```
Copy the string behind the token= and paste it into the input field in the browser.

Are GPUs accessible from within a running container?

Yes. Unlike Docker, Enroot does not need further command line options to enable GPU passthrough like --runtime=nvidia or --privileged.

Is there something like enroot-compose?

AFAIK no. Enroot is mainly intended for HPC workloads, not for operating multi-container applications. However, starting and running these applications separately is possible.

Can I use workspaces to store containers?

Yes. You can define the location of configuration files and storage with environment variables. The ENROOT_DATA_PATH variable should be set accordingly. Please refer to NVIDIA's documentation on runtime configuration.

Additional resources¶

Source code: https://github.com/NVIDIA/enroot
Documentation: https://github.com/NVIDIA/enroot/blob/master/doc
Additional information: FOSDEM 2020 talk + slides

https://github.com/NVIDIA/pyxis ↩