Accessing Singularity on HPC
Singularity is installed on the operating systems of all HPC compute nodes, so can be easily accessed either from an interactive session or batch script without worrying about software modules.
Building a Container
Building a container locally requires root authority which users do not have on HPC. This means you must use a Mac or Linux workstation where you have sudo privileges and Singularity installed. The Sylabs website has instructions that can help users get started on building their own containers. Additionally, Nvidia provides an HPC Container Maker which lets you build a recipe without having to know all the syntax. You will just include the blocks you need (e.g., Cuda or Infiniband) and it will create the recipe that you can use for a build on your local workstation.
To bypass the issue of needing root privileges to build your container, Singularity Hub lets you build and keep containers in the cloud as well as share them with other users. You maintain your recipes there and each time you need to pull one, it gets built remotely and is retrieved to your workstation. This conveniently allows you to build containers directly from HPC.
As an example, if you want to build a container in your account, first go to https://cloud.sylabs.io, generate an access token (API key), and save it to your clipboard. Next, log in to an interactive terminal session and find your recipe file. In this example, we'll use the recipe:
Then, assuming the recipe is stored in our home directory, we can build it remotely using:
This will produce a .sif file in your home directory that is ready for use.
Singularity, Nvidia and GPU's
One of the most significant use cases for Singularity is to support machine learning workflows. For information on using GPUs on HPC, see our GPU documentation.
Pulling Nvidia Images
The NVIDIA GPU Cloud (NGC) provides GPU-accelerated HPC and deep learning containers for scientific computing. NVIDIA tests HPC container compatibility with the Singularity runtime through a rigorous QA process. Application-specific information may vary so it is recommended that you follow the container-specific documentation before running with Singularity. If the container documentation does not include Singularity information, then the container has not yet been tested under Singularity.
- The containers from nvidia that are in /contrib have been modified to include path bindings to /xdisk and /groups. They also include the path to the Nvidia commands like
- Because login nodes are small and do not provide software, singularity images should be pulled and executed on a compute node.
The general form to pull and convert a NGC image to a local Singularity image file is:
This Singularity build command will download the app:tag NGC Docker image, convert it to Singularity format, and save it to the local filename local_image. For example, to pull the namd NGC container tagged with version 2.12-171025 to a local file named namd.simg saved to your home directory:
Singularity containers are themselves ostensibly read only. In order to provide application input and output host directories are generally bound to the container, this is accomplished through the Singularity -B flag. The format of this flag is
-B <host_src_dir>:<container_dst_dir>. Once a host directory,
host_src_dir, is bound into the container you may interact with this directory from within the container, located at
container_dst_dir, the same as you would outside the container.
You may also make use of the
--pwd <container_dir> flag, which will be used to set the present working directory of the command to be run within the container.
Ocelote does not support filesystem overlay and as such the container_dst_dir must exist within the image for a bind to be successful. To get around the inability to bind arbitrary directories $HOME and /tmp are mounted in automatically and may be used for application I/O.
All NGC containers are optimized for NVIDIA GPU acceleration so you will always want to add the
--nv flag to enable NVIDIA GPU support within the container.
Standard run command:
The Singularity command below represents the canonical form that will be used on the Ocelote cluster.
Containers Available on HPC
We support the use of HPC and ML/DL containers available on NVIDIA GPU Cloud (NGC). Many of the popular HPC applications including NAMD, LAMMPS and GROMACS containers are optimized for performance and available to run in Singularity on Ocelote or Puma. The containers and respective README files can be found in /contrib/singularity/nvidia.
- The Nvidia images have been modified to include bindings for your /xdisk and /groups directories if you want to run your jobs there
- The filename has a tag at the end that represents when it was made. For example, 18.01 is January 2018.
|nvidia-caffe.18.09-py2.simg||Caffe is a deep learning framework made with expression, speed, and modularity in mind. It was originally developed by the Berkeley Vision and Learning Center (BVLC)|
PyTorch is a Python package that provides two high-level features:
|MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix the flavors of symbolic programming and imperative programming to maximize efficiency and productivity.|
|nvidia-tensorflow.18.09-py3.simg||TensorFlow is an open source software library for numerical computation using data flow graphs. TensorFlow was originally developed by researchers and engineers working on the Google Brain team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research.|
|nvidia-theano.18.08.simg||Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.|
- The Sylabs GitHub site has files and instructions for creating sample containers.
- Our Github repository has Singularity examples available that can be run on HPC.
The lolcow image is often used as the standard "hello world!" introduction to containers and is described in Singularity's documentation. To follow their example, first start by logging into an interactive terminal session and pull the image:
This will pull the image from Docker Hub and save it in your home in a hidden directory
.singularity. Next, run the image simply using
Running Singularity in a Batch Job
Running a job with Singularity is as easy as running other jobs, simply include your resource requests, and include any commands necessary to execute your workflow. For more detailed information on creating and running jobs, see our SLURM documentation or Puma Quick Start. An example script might look like:
Example Recipe Files
CentOS with Tensorflow
To build and test a container from the recipe from an interactive session on a GPU node:
As a tensorflow example, you could use the following script:
Singularity supports MPI pretty well since, by default, the network is the same inside and outside the container. The more complicated bit is making sure that the container has the right set of MPI libraries. MPI is an open specification, but there are several different implementations (OpenMPI, MVAPICH2, and Intel MPI to name three) with some non-overlapping feature sets. If the host and container are running different MPI implementations, or even different versions of the same implementation, hilarity may ensue.
The general rule is that you want the version MPI inside the container to be the same version or newer than the host. You may be thinking that this is not good for the portability of your container and you are right. Containerizing MPI applications is not terribly difficult with Singularity, but it comes at the cost of additional requirements for the host system.
In this example, the infiniband pieces are installed and then the MVAPICH version of MPI. When the job is run, the script will need to load the correct module with the matching version of MVAPICH.