File Systems¶
A central aspect in the design of HAICORE has been the enormous amount of data generated by scientific research projects. A multi-level data storage concept guarantees high-throughput processing of data using several different storage systems.
The core of this design are two large-scale, parallel file systems based on IBM Spectrum Scale (also known as GPFS) used for globally visible user data. Individual home directories are automatically created for each user on the Spectrum Scale home file system, and the environment variable $HOME points to these directories. Each user can also create so-called workspaces on the Spectrum Scale work file system.
Other storage locations include a temporary directory called $TMP that is is located on the local solid state disks (SSDs) of a node, and is therefore only visible on an individual node while a job is running. In order to create a temporary directory which is visible on all nodes of a batch job, users can request a temporary BeeGFS On Demand (BeeOND) file system. Access to BeeOND file systems is only possible from the nodes of the batch job and while the job is running.
The characteristics of the file systems are shown in the following table.
Property | $HOME | workspace | $TMP | BeeOND |
---|---|---|---|---|
Visibility | global | global | local | job local |
Lifetime | permanent | limited | job walltime | job walltime |
Disk space | 2.5 PB | 13.5 PB | 800 GB | n * 750 GB |
Quotas | yes | yes | no | no |
Snapshot | yes | yes | no | no |
Backup | yes | no | no | no |
Total read perf | 25 GB/s | 110 GB/s | 750 MB/s | n * 700 MB/s |
Total write perf | 25 GB/s | 110 GB/s | 750 MB/s | n * 700 MB/s |
Read perf/node | 10 GB/s | 10 GB/s | 750 MB/s | 10 GB/s |
Write perf/node | 10 GB/s | 10 GB/s | 750 MB/s | 10 GB/s |
global : all nodes see the same file system
local : each node has its own local file system
job local : only available within the currently running job
permanent : data is stored permanently (across job runs and reboots)
limited : data is stored across job runs and reboots, but will be deleted at some time
job walltime : files are removed at end of the batch job.
Selecting the appropriate file system¶
In general, you should separate your data and store it on the appropriate file system.
Permanently required data like software or important results should be stored below $HOME, but capacity limits (so-called "quotas") apply. Permanent data which is not needed for months or exceeds the capacity restrictions should be sent to external large scale storage systems and deleted from the home file system.
Temporary data which is only needed on a single node and which does not exceed the disk space shown in the table above should be stored below $TMP. Temporary data which is only needed during job runs should be stored on BeeOND. Scratch data which can be easily recomputed or which is the result of one job and input for another job should be stored below so-called workspaces. The lifetime of data in workspaces is limited and depends on the lifetime of the workspace.
Backups
If you accidentally deleted data on $HOME, you can usually copy back an older version from a so-called snapshot path. In addition there is also a the possibility to restore files from a backup. Please see the Backup and Archival section for more information.
$HOME¶
Users have to migrate their data
Please note that user data on HAICORE which was created before Sep 22nd, 2022, needs to be migrated by the users until Feb 29th, 2023. See the filesystems section for more details.
For each uaser a fixed amount of disk space for the $HOME directory is reserved. The disk space is controlled by so-called quotas. The default quota limit per user is 1 TB and 2 million inodes.
Workspaces¶
Users have to migrate their data
Please note that user data on HAICORE which was created before Sep 22nd, 2022, needs to be migrated by the users until Feb 29th, 2023. See the filesystems section for more details.
Workspaces are directory trees which are available for a limited amount of time (few months). The corresponding Spectrum Scale work file system has no backup, i.e. you should use workspaces for data which can be recreated, e.g. by running the same batch jobs once again. This is only needed in the very unlikely case that the file system gets corrupt.
Initially workspaces have a maximum lifetime of 60 days. You can extend the lifetime 3 times for another 60 days but you should do this near the end of the lifetime since the new lifetime starts when you execute the command which requests the extension.
If a workspace has inadvertently expired we can restore the data during a limited time (few weeks). In this case you should create a new workspace and report the name of the new and of the expired workspace by opening a support ticket.
For your account (user ID) there is a quota limit for all of your workspaces and for the expired workspaces (as long as they are not yet completely removed). The default quota limit per user is 250 TB and 50 million inodes.
Create workspace¶
To create a workspace you need to state ''name'' of your workspace and ''lifetime'' in days. Note that maximum integer for ''lifetime'' is 60. Execution of:
$ ws_allocate blah 30
returns:
Info: could not read email from users config ~/.ws_user.conf.
Info: reminder email will be sent to local user account
Info: creating workspace.
/hkfs/work/workspace_haic/scratch/USERNAME-blah
remaining extensions : 3
remaining time in days: 30
For more information read the program's help, i.e. ''$ man ws_allocate''.
Reminder for workspace deletion¶
By default you will get an email about an expiring workspace 7 days before a workspace expires. You can adapt this time by using the option ''-r
You can also send yourself a calender entry which reminds you when a workspace will be automatically deleted:
$ ws_send_ical <workspace> <email>
List all your workspaces¶
To list all your workspaces, execute:
$ ws_list
which will return you:
- Workspace ID
- Workspace location
- creation date, remaining time and expiration date
- available extensions
Find workspace location¶
Workspace location/path can be prompted for any workspace ''ID'' using ws_find, in case of workspace ''blah'':
$ ws_find blah
returns the one-liner:
/hkfs/work/workspace_haic/scratch/USERNAME-blah
Extend lifetime of your workspace¶
Any workspace's lifetime can be only extended three times. There two similar commands to extend workspace lifetime:
$ ws_extend blah 40
which extends workspace ID ''blah'' by ''40'' days from now,$ ws_allocate -x blah 40
which extends workspace ID ''blah'' by ''40'' days from now.
Delete a workspace¶
$ ws_release blah # Manually erase your workspace blah
$TMP¶
The environment variable $TMP contains the name of a directory which is local to each node. This means that different tasks of a parallel application use different directories when they do not utilize the same node. This directory should be used for temporary files being accessed from the local node during job runtime. The $TMP directory is located on an extremely fast 960 GB NVMe SSD disk. This means that performance on small files is much better than on the parallel file systems.
Inside batch jobs $TMP is newly set. $TMP contains the job ID and the job's starting time so that the subdirectory name is unique for each job. At the end of the job the directory $TMP is removed.
On login nodes $TMP also points to a fast directory on a local NVMe SSD disk but this directory is not unique. It is recommended to create your own unique subdirectory on these nodes. This directory should be used for the installation of software packages. This means that the software package to be installed should be unpacked, compiled and linked in a subdirectory of $TMP. The real installation of the package (e.g. make install) should be made in(to) the $HOME or $PROJECT folder.
BeeOND (BeeGFS On-Demand)¶
Users of the cluster HoreKa can request a private BeeOND (BeeGFS) parallel filesystem for each job. The file system is created during job startup and purged after your job. This means that all data on the private BeeOND filesystem will be deleted after your job. Make sure you have copied your data back within your job to the global filesystem, e.g. $HOME, $PROJECT, any workspace or the LSDF.
BeeOND/BeeGFS can be used like any other parallel file system. Tools like cp or rsync can be used to copy data in and out.
A BeeOND file system is only created if your batch job requests this creation. For details see here.
Snapshots and backup¶
In case you inadvertently deleted some of your data, want to go back to a previous version or compare your data with a previous version you can use so-called snapshot. Snapshots are a point in time copy of your data. For the home file system there will be snapshots of the last 7 days, of the last 4 weeks and of the last 6 months. For the workspaces there will be snapshots of the last 7 days. For the home file system snapshots will be located below /home/<group>/.snapshots
. For the work (workspace) file system snapshots will be located below /hkfs/work/.snapshots
.
There are also regular backups of all data of the project directories, whereas ACLs and extended attributes will not be saved by the backup. Please open a support ticket if you need us to restore backup data.
Quotas¶
To display your used quotas and quota limits of $HOME just execute the following command on a login node:
$ /usr/lpp/mmfs/bin/mmlsquota -u $USER --block-size G -C hkn.scc.kit.edu hkfs-home:$PROJECT_GROUP
$ /usr/lpp/mmfs/bin/mmlsquota -u $(whoami) --block-size G -C hkn.scc.kit.edu hkfs-work
File system performance tuning¶
Hints on file system performance tuning can be found here.
Migrating data¶
Users which used HAICORE before Sep 22nd, 2022, have to migrate their data by themselves. This has to be done until Feb 29th, 2023, since the old data will be deleted afterwards. The reasons why users have to do it by themselves are that we do not know which users will continue to use the cluster and that we don't know the new account and the new group of external users before the registration took place. Note that the migration is of course not necessary for new HAICORE users. Please open a support ticket if you do not feel comfortable with doing this migration by yourself. Then we will help you.
Migrating data of your old HOME directory¶
Please follow the following procedure to migrate data of your old HOME directory. Angle brackets (<>) and the included text have to replaced by appropriate names or values. Note that the account name might have changed for some users and that you need to know your old account name. If you do not know the old account name and still want to migrate your data please open a support ticket. In order to get your current account name you can execute the following command:
echo $USER
- Find the path to your old HOME directory:
In case nothing is displayed you probably did not use HAICORE before Sep 22nd, 2022. If you believe this is not true there might be a permission problem and you should open a support ticket.
ls -d /hkfs/home/project/haicore-project-*/<your_old_account_name>
- (optional) Check which data will be transferred during the next step (by using the option
--dry-run
):Note that the slashes (/) are important. The optionrsync -rlptDAHxv --ignore-existing --dry-run <your_old_account_name>@horeka.scc.kit.edu:<path_with_output_of_first_step>/ $HOME/
--ignore-existing
makes sure that you do not overwrite existing files. You can remove this option in case you want to overwrite existing files. In order to execute the command you have to accept the host key (type "yes") and login as usual with the OTP and the password of your old account. - Copy data of old HOME directory to new HOME directory:
Note that the slashes (/) are important. This operation might take a long time to complete. The
rsync -rlptDAHx --ignore-existing <your_old_account_name>@horeka.scc.kit.edu:<path_with_output_of_first_step>/ $HOME/
rsync
options above do not include a preservation of owner and group. Hence the new files and directories will have a different owning group and access rights will be granted to the users of your new group. - (optional) Remove the data on your old HOME directory:
Your access rights do not allow to remove the directory itself and hence a corresponding error message is normal.
rm -rf <path_with_output_of_first_step>
Migrating data of your existing workspaces¶
Please follow the following procedure to migrate data of your existing workspaces. Repeat this procedure for each workspace which should be migrated. Angle brackets (<>) and the included text have to replaced by appropriate names or values. Note that the account name might have changed for some users and that you need to know your old account name. If you do not know the old account name and still want to migrate your data please open a support ticket. In order to get your current account name you can execute the following command:
echo $USER
- Find the path of your old workspaces:
Do the following steps for each path (workspace) which should be migrated.
ls -d /hkfs/work/workspace/scratch/<your_old_account_name>*
- Create a new workspace:
ws_allocate <name_of_new_workspace> <lifetime_in_days>
- (optional) Check which data will be transferred during the next step (by using the option
--dry-run
):Note that the slashes (/) are important. In order to execute the command you have to accept the host key (type "yes") and login as usual with the OTP and the password of your old account.rsync -rlptDAHxv --dry-run <your_old_account_name>@horeka.scc.kit.edu:<path_with_one_output_line_of_first_step>/ $(ws_find <name_of_new_workspace>)/
- Copy the data of the old workspace to the new workspace:
Note that the slashes (/) are important. This operation might take a long time to complete. The
rsync -rlptDAHx <your_old_account_name>@horeka.scc.kit.edu:<path_with_one_output_line_of_first_step>/ $(ws_find <name_of_new_workspace>)/
rsync
options above do not include a preservation of owner and group. Hence the new files and directories will have a different owning group and access rights will be granted to the users of your new group. - (optional) Remove the data on your old workspace:
Your access rights do not allow to remove the directory itself and hence a corresponding error message is normal.
rm -rf <path_with_one_output_line_of_first_step>