Graphcore¶

Graphcore is a British semiconductor company that develops accelerators for artificial intelligence and machine learning (AI/ML). A Graphcore POD16 system made up of four IPU-M2000 appliances was attached to the FTP-X86 cluster in October 2021. It was the first installation of this system type in Germany.

Hardware Overview¶

The basic building block of the installation is a single Graphcore "IPU-Machine: 2000" (short: IPU-M2000) appliance housing four Colossus GC200 "Intelligent Processing Units" (IPUs). The 1472 independent processor cores in each IPU are highly optimized for the calculations required by many AI/ML codes.

The memory layout is also optimized for these kinds of operations. Every IPU has about 900 megabytes of on-chip SRAM, and up to 450 GB of external "Streaming Memory" or "Exchange Memory" can be connected to the special memory controllers built into the four IPUs.

A single IPU-M2000 system with four Colossus GC2000 IPU processors

Multiple IPU-M2000 systems can be connected to create larger instances, called "PODs". A POD16 installation is built using 4 IPU-M2000 appliances with a total of 16 IPUs. The largest installation currently possible is a POD256 built using 64 IPU-M2000 appliances.

All IPU-M2000 appliances in the same POD are connected to each other using proprietary cluster and synchronization links. These are colored in dark grey and red in the following figure.

The four IPUs in each IPU-M2000 appliance are managed by a small ARM-based server running a custom, Linux-based operating system. This ARM server is called the "IPU-Gateway SoC". It also controls all the external interfaces and manages communication with the other appliances.

Users cannot directly access the Gateway SoCs. Instead all IPU-M2000 systems in a POD are connected to a separate, additional server, called the "Host Server". The host server runs a standard Linux-based operating system on which the Graphcore user-space libraries and tools have been installed. It also has multiple 100 Gigabit Ethernet links connected to the IPU-M2000 systems for fast data transfers between the Host server and the IPUs.

Accessing the IPUs¶

The POD16 system is currently connected to the ftp-x86n7 node. Only calculations that are started on this node can be offloaded to the Graphcore appliances. All other nodes in the FTP-X86 cluster have no connection to the IPU-M2000 appliances.

To get access to this node you, have to start an interactive job using salloc or an asnychronous job using sbatch in the amd-milan-graphcore batch system partition.

Software¶

The Graphcore software stack is separated into two parts.

The vipu-server daemon is a privileged piece of software that controls the IPU-M2000 systems. Similar to the Slurm batch system managing the whole FTP-X86 cluster, resources (IPUs) have to be allocated from this server before any calculations can be executed on the IPUs. The vipu-server instance runs on the Gateway SoC of the first IPU-M2000 appliance, not on the Host server.

A single vipu-server can manage multiple "clusters" (usually clusters are equal to PODs), each of which can be divided into multiple "partitions" containing at least one IPU-M2000. Currently only one cluster (called da-cluster) with a single partition (called p) containing all 16 IPUs is configured.

Resource management

Management of the IPUs is currently not integrated with Slurm, and the vipu-server does not support actual batch queues with waiting jobs. If multiple users on the Host server try to send calculations to the same partition on the same IPU cluster, the first attempt will succeed and all other attempts will fail while the first is still running.

You can use the gc-monitor tool described below to check which IPU partitions are already in use.

The second part is the user-space software stack installed on the Host Server. This is mainly "Poplar", a collection of tools and libraries acting as an abstraction layer that hides most of the complexity of the actual setup with its many components and network links. These tools and libraries handle communication with the vipu-server.

Poplar can be loaded through the environment module system using

$ module load toolkit/poplar

IPU monitoring¶

The gc-monitor tool outputs a list of all clusters and IPUs known to the vipu-server. It also displays which jobs are currently running on the IPUs, and which resources and partitions are in use.

+---------------+--------------------------------------------------------------------------------+
|  gc-monitor   |                   Partition: 'p' has 16 reconfigurable IPUs                    |
+-------------+--------------------+----------+------+--------------+----+---------+-------+-----+
|    IPU-M    |       Serial       |  ICU FW  | Type |Server version| ID | PCIe ID |Routing|GWSW |
+-------------+--------------------+----------+------+--------------+----+---------+-------+-----+
|...55.255.162| 0063.0002.8204721  |  2.2.5   |M2000 |    1.7.1     | 0  |    3    |  DNC  |2.2.0|
|...55.255.162| 0063.0002.8204721  |  2.2.5   |M2000 |    1.7.1     | 1  |    2    |  DNC  |2.2.0|
+-------------+--------------------+----------+------+--------------+----+---------+-------+-----+
|...55.255.162| 0063.0001.8204721  |  2.2.5   |M2000 |    1.7.1     | 2  |    1    |  DNC  |2.2.0|
|...55.255.162| 0063.0001.8204721  |  2.2.5   |M2000 |    1.7.1     | 3  |    0    |  DNC  |2.2.0|
+-------------+--------------------+----------+------+--------------+----+---------+-------+-----+
|...55.255.130| 0047.0002.8204721  |  2.2.5   |M2000 |    1.7.1     | 4  |    3    |  DNC  |2.2.0|
|...55.255.130| 0047.0002.8204721  |  2.2.5   |M2000 |    1.7.1     | 5  |    2    |  DNC  |2.2.0|
+-------------+--------------------+----------+------+--------------+----+---------+-------+-----+
|...55.255.130| 0047.0001.8204721  |  2.2.5   |M2000 |    1.7.1     | 6  |    1    |  DNC  |2.2.0|
|...55.255.130| 0047.0001.8204721  |  2.2.5   |M2000 |    1.7.1     | 7  |    0    |  DNC  |2.2.0|
+-------------+--------------------+----------+------+--------------+----+---------+-------+-----+
|...55.255.226| 0038.0002.8204721  |  2.2.5   |M2000 |    1.7.1     | 8  |    3    |  DNC  |2.2.0|
|...55.255.226| 0038.0002.8204721  |  2.2.5   |M2000 |    1.7.1     | 9  |    2    |  DNC  |2.2.0|
+-------------+--------------------+----------+------+--------------+----+---------+-------+-----+
|...55.255.226| 0038.0001.8204721  |  2.2.5   |M2000 |    1.7.1     | 10 |    1    |  DNC  |2.2.0|
|...55.255.226| 0038.0001.8204721  |  2.2.5   |M2000 |    1.7.1     | 11 |    0    |  DNC  |2.2.0|
+-------------+--------------------+----------+------+--------------+----+---------+-------+-----+
|...55.255.194| 0040.0002.8204721  |  2.2.5   |M2000 |    1.7.1     | 12 |    3    |  DNC  |2.2.0|
|...55.255.194| 0040.0002.8204721  |  2.2.5   |M2000 |    1.7.1     | 13 |    2    |  DNC  |2.2.0|
+-------------+--------------------+----------+------+--------------+----+---------+-------+-----+
|...55.255.194| 0040.0001.8204721  |  2.2.5   |M2000 |    1.7.1     | 14 |    1    |  DNC  |2.2.0|
|...55.255.194| 0040.0001.8204721  |  2.2.5   |M2000 |    1.7.1     | 15 |    0    |  DNC  |2.2.0|
+-------------+--------------------+----------+------+--------------+----+---------+-------+-----+
+-----------------------------------------------------+------------------------+-----------------+
|                 Attached processes                  |          IPU           |      Board      |
+--------+----------------------+--------+------------+----+----------+--------+--------+--------+
|  PID   |       Command        |  Time  |    User    | ID |  Clock   |  Temp  |  Temp  | Power  |
+--------+----------------------+--------+------------+----+----------+--------+--------+--------+
|2094889 |        python        |  34s   |   zs0402   | 0  | 1330MHz  |  N/A   |  N/A   |  N/A   |
+--------+----------------------+--------+------------+----+----------+--------+--------+--------+

Using Tensorflow¶

Graphcore maintains Tensorflow forks that have been patched to offload calculations to the IPUs using Poplar. The IPUs will show up as XLA devices in Tensorflow, and there are additional IPU-specific commands and settings.

The source code for the Tensorflow 2 fork can be found here, the documentation can be found here.

The Poplar distribution installed as an environment module also ships a Python Wheel file that can be installed locally using pip3. The ${POPLAR_TENSORFLOW_WHEEL} environment variable defined by the Poplar module points to the path of the Wheel file included with the currently loaded Poplar version.

## Load the Poplar module
$ module load toolkit/poplar

## Create and activate a Virtual Environment
$ python -m venv venv
$ source venv/bin/activate

## Install the wheel file using the path defined in the environment variable
$ pip3 install ${POPLAR_TENSORFLOW_WHEEL}

A minimal usage example looks like this:

## Load the Poplar module (if not already loaded)
$ module load toolkit/poplar

## Activate the previously created Virtual Environment
$ source venv/bin/activate

## Minimal Tensorflow example with offload to the IPUs
$ python <<EOF
import tensorflow as tf;
from tensorflow.python import ipu;

cfg = ipu.config.IPUConfig();
cfg.auto_select_ipus = 4;
cfg.ipu_model.compile_ipu_code = False;
cfg.ipu_model.version = "ipu2";

cfg.configure_ipu_system();

print(tf.reduce_sum(tf.random.normal([50000, 50000])))
EOF

When executed correctly, the calculation will be offloaded to the IPUs and gc-monitor shows the Python job in the "Attached processes" section.

Using PyTorch¶

Graphcore maintains a PyTorch fork called "PopTorch" that has been patched to offload calculations to the IPUs.

The source code can be found here, the documentation can be found here.

The Poplar distribution installed as an environment module also ships a PopTorch Wheel file that can be installed locally using pip3. The ${POPLAR_POPTORCH_WHEEL} environment variable defined by the Poplar module points to the path of the Wheel file included with the currently loaded Poplar version.:

## Load the Poplar module
$ module load toolkit/poplar

## Create and activate a Virtual Environment
$ python -m venv venv
$ source venv/bin/activate

## Install the wheel file using the path defined in the environment variable
$ pip3 install ${POPLAR_POPTORCH_WHEEL}
$ pip3 install torch==1.9.0

Warning

The provided PyTorch Wheel file installs torch version 1.10, which does not work with PopTorch 2.3.0. Torch 1.9 has to be installed manually. Further required modules can be installed via pip.

PopART¶

The Poplar Advanced Runtime (PopART) is an optimized toolkit for importing and executing models from industry standard ML frameworks. PopART includes tools and libraries that can operate on models stored in the ONNX format and execute them on the IPUs.

The source code can be found here, the documentation can be found here

PopART can be loaded through the environment module system using

$ module load toolkit/popart

Last update: September 28, 2023