Skip to content

File system performance tuning

The following recommendations might help to improve throughput and metadata performance on parallel filesystems, i.e. on $HOME, $PROJECT, LSDF, workspaces and BeeOND.

Improving Performance on parallel file systems

The following recommendations might help to improve throughput and metadata performance on parallel filesystems, i.e. on $HOME, $PROJECT, LSDF, workspaces and BeeOND.

Improving Throughput Performance

When you are designing your application you should consider that the performance of parallel filesystems is generally better if data is transferred in large blocks and stored in few large files. In more detail, to increase throughput performance of a parallel application following aspects should be considered:

  • collect large chunks of data and write them sequentially at once

  • to exploit complete filesystem bandwidth use several clients

  • avoid competitive file access from different tasks or clients

Spectrum Scale normally uses all disks to store the data of huge files, i.e. no adaptions are required by the user. Other parallel filesystems such as BeeOND use a fixed stripe count to select the number of disks which are used for a single file. Therefore, if many tasks use few huge files on BeeOND a directory with a high stripe count should be selected on the root of the BeeOND file system.

Improving Metadata Performance

Metadata performance on parallel file systems is usually not as good as with local filesystems. Therefore, you should omit metadata operations whenever possible. For example, it is much better to have few large files than lots of small files. In more detail, to increase metadata performance of a parallel application following aspects should be considered:

  • avoid creating many small files

  • avoid competitive directory access, e.g. by creating files in separate subdirectories for each task

  • if many small files are only used by one process store them on $TMP,

  • change the default colorization setting of the command ls (see below).

ls colorization

On modern Linux systems, the GNU ls command often uses colorization by default to visually highlight the file type; this is especially true if the command is run within a terminal session. This is because the default shell profile initializations usually contain an alias directive similar to the following for the ls command:

$ alias ls=’ls -color=tty’

However, running the ls command in this way for files on a parallel file system requires a stat() call to be used to determine the file type. This can result in a performance overhead, because the stat() call always needs to determine the size of a file, and that in turn means that the client node must query the object size of all the backing objects that make up a file.

As a result of the default colorization setting, running a simple ls command on a Lustre file system often takes as much time as running the ls command with the -l option (the same is true if the -F, -p, or the -classify option, or any other option that requires information from a stat() call, is used). To avoid this performance overhead when using ls commands, add an alias directive similar to the following to your shell startup script:

$ alias ls=’ls -color=none’

Last update: May 31, 2021