Ocelote has 46 new compute nodes with Nvidia P100 GPU's. These are available to researchers on campus. There will be fairshare limitations but the intention is for them to be as widely available as possible. There are still compute nodes on El Gato with 70 nodes provisioned with Nvidia Tesla K20's.
Currently the following Cuda modules are available on Ocelote:
The OpenACC API is a collection of compiler directives and runtime routines that allow you to specify loops and regions of code in standard C, C++, and Fortran that you can offload from a host CPU to the GPU.
We provide two methods of support for OpenACC
- We support OpenACC in the PGI Compiler. The PGI implementation of OpenACC is considered the best implementation.
"module load pgi". If you are on a GPU node from an interactive session you can run "pgaccelinfo" to test functionality. Remember that the login nodes do not have GPUs installed.
A useful getting started guide written by Nvidia is at:
- We support OpenACC in the GCC Compiler 6.1 which is automatically loaded as a module when you log in. Verify with "module list".
The GCC 6 release includes a much improved implementation of the OpenACC 2.0a specification.
A useful quick reference guide can be found at:
About two times a year we host the Xsede Workshop on Programming GPU's with OpenACC. These courses provide an overview of how to accelerate your code without a lot of programming knowledge. Watch for announcements to the HPC-Info list.
Many applications have been optimized to run faster on GPU's. These include:
- NAMD - installed as a module; module load namd_cuda
- VASP - A restricted license version is installed on Ocelote; only available to the licensed users
- GROMACS - Installed as a module on Ocelote; module load gromacs
- LAMMPS - Installed as a module on Ocelote; module load lammps/gcc/16Mar18
- ABAQUS - Installed as a module on Ocelote; module load abaqus
- GAUSSIAN - We currently do not have the GPU version
- MATLAB - Review the GPU Coder at their web site
- ANSYS Fluent
- ML and DL frameworks - See the next section below
NVIDIA GPU Cloud Container Registry
We support the use of HPC and ML/DL containers available on NVIDIA GPU Cloud (NGC). Many of the popular HPC applications including NAMD, LAMMPS and GROMACS containers are optimized for performance and available to run in Singularity on Ocelote.
The containers and respective README files can be found at /unsupported/singularity/nvidia
NGC also provides a set of popular ML and DL frameworks which is not a trivial task. They have made them available to us and they will be updated regularly. They are currently located at: /unsupported/singularity/nvidia
The Nvidia images at /unsupported/singularity/nvidia have been modified to include bindings for your /extra and /rsgrps directories if you want to run you jobs from those directories.
|nvidia-caffe.18.09-py2.simg||Caffe is a deep learning framework made with expression, speed, and modularity in mind. It was originally developed by the Berkeley Vision and Learning Center (BVLC)|
PyTorch is a Python package that provides two high-level features:
|MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix the flavors of symbolic programming and imperative programming to maximize efficiency and productivity.|
|nvidia-tensorflow.18.09-py3.simg||TensorFlow is an open source software library for numerical computation using data flow graphs. TensorFlow was originally developed by researchers and engineers working on the Google Brain team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research.|
|nvidia-theano.18.08.simg||Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.|
Each is provided in a Singularity container.
The file name has a tag at the end that represents when it was made, so 18.01 is January 2018
Pulling Nvidia ML / DL Images on Ocelote
It is possible for you to create your own Singularity containers on Ocelote pulling down the images created by Nvidia. The general rule that you cannot create your own containers because that would require root authority still applies. Root authority is not required if you follow this procedure.
The GPU nodes have more memory than the other Ocelote nodes so the select statements reflect 8GB per core by 28 cores.
Either: copy the file you wish to use to your directory. Your home path as well as /extra and /xdisk have been bound to the image, so those are your choices.
Or: run the singularity file from where it is. Since you cannot modify it you will not interfere with anyone else.
For interactive use, start an interactive job on a GPU node modifying this command:
$ qsub -I -N jobname -m bea -W group_list=GROUP-NAME -q windfall -l select=1:ncpus=28:mem=224gb:ngpus=1 -l cput=1:0:0 -l walltime=1:0:0
You must change the group_list and you should change the other attributes as desired.
On the compute node assigned to you, as an example you can run:
$ module load singularity $ singularity exec --nv nvidia-tensorflow.18.01-py3.simg python tensorflow_example.py
You need to include the --nv and note it has two dashes. This will bind the Cuda libraries.
The example file is included in this directory. "tensorflow_example.py"
For batch use, you will include these three lines in your submission script
#PBS -l select=1:ncpus=28:mem=224gb:ngpus=1 module load singularity singularity exec --nv nvidia-tensorflow.18.01-py3.simg python tensorflow_example.py
You will want exclusive access to the node so there is not contention for the GPU. That is obtained by asking for all 28 cores as shown above
There are more detailed examples here
For more information on Singularity, see their web site at:
There are tutorials for Singularity on HPC here
We host workshops from the Pittsburgh Supercomputer Center which is a NSF funded location. We are working with Nvidia to offer a workshop in the April 2018 timeframe.
Watch for announcements from the hpc-info list.
Nvidia periodically runs training sessions like these ones:
Accelerate Your Code with OpenACC