Datasets

Some scientific communities, especially in the areas of computer vision and machine learning, maintain large collections of reference data files that are used by multiple, independent research projects. Good examples are ImageNet, CIFAR and CAM5.

These datasets can be in the range of tens of gigabytes and millions of individual files. It is therefore more efficient to provide the most important datasets from a central location. On HoreKa the datasets provided by the HPC teams are located in a path referenced by the environment variable $DATASETS.