Column |
---|
|
Image Removed
|
Ocelote has 46 new compute nodes with Nvidia P100 GPU's. These are available to researchers on campus. There will be fairshare limitations but the intention is for them to be as widely available as possible. There are still compute nodes on El Gato with 70 nodes provisioned with Nvidia Tesla K20's.
Specifications
Image Removed
Cuda Modules
Currently the following Cuda modules are available on Ocelote:
/cm/shared/modulefiles | cuda75/blas/7.5.18 | cuda75/nsight/7.5.18 | cuda80/blas/8.0.61 | cuda80/nsight/8.0.61 |
cuda75/fft/7.5.18 | cuda75/profiler/7.5.18 | cuda80/fft/8.0.61 | cuda80/profiler/8.0.61 |
cuda75/gdk/352.79 | cuda75/toolkit/7.5.18 | cuda80/gdk/352.79 | cuda80/toolkit/8.0.61 |
/cm/shared/uamodulefiles | cuda75/neuralnet/5/5.1 | cuda75/neuralnet/6/6.0 | cuda80/neuralnet/5/5.1 | cuda80/neuralnet/6/6.0 |
OpenACC Page Banner |
---|
image | https://public.confluence.arizona.edu/download/attachments/86409225/gpu.jpg?api=v2 |
---|
title | GPU Nodes |
---|
|
Excerpt Include |
---|
| Getting Help |
---|
| Getting Help |
---|
nopanel | true |
---|
|
Panel |
---|
borderColor | #9c9fb5 |
---|
bgColor | #fcfcfc |
---|
borderWidth | 2 |
---|
borderStyle | solid |
---|
|
Deck of Cards |
---|
startHidden | false |
---|
id | cluster information |
---|
|
Card |
---|
default | true |
---|
id | puma |
---|
label | Puma |
---|
title | Puma |
---|
| PumaPuma has a different arrangement for GPU nodes than Ocelote and ElGato. Whereas the older clusters have one GPU per node, Puma has four. This has a financial advantage for providing GPU's with lower overall cost, and a technical advantage of allowing jobs that can use multiple GPU's to run faster than spanning multiple nodes. This capability comes from using a newer operating system. Each node has four Nvidia V100S model GPUs. They are provisioned with 32GB memory compared to 16GB on the P100's. Image Added
|
Card |
---|
id | ocelote |
---|
label | Ocelote |
---|
title | Ocelote |
---|
| OceloteOcelote has 45 compute nodes with Nvidia P100 GPUs that are available to researchers on campus. The limitation is a maximum of 10 concurrent jobs. One node with a V100 is also available. Since there is only one, you can feel free to use it for testing and comparisons to the P100, but production work should be run on the P100's. There is also one node with two P100's for testing jobs that use two GPU's. This one should be used to compare with running a job on two nodes. Image Added |
|
|
Panel |
---|
borderColor | #9c9fb5 |
---|
bgColor | #fcfcfc |
---|
borderWidth | 2 |
---|
borderStyle | solid |
---|
|
Cuda Modules
Warning |
---|
Nvidia Nsight Compute (the interactive kernel profiler) is not available. In response to a security alert (CVE-2018-6260) this capability is only available with root authority which users do not have. |
The latest Cuda module available on the system is 11.0 and is the only version until newer ones come along. The Cuda driver version can be queried with the nvidia-smi command. To see the modules available, in an interactive session simply run: Code Block |
---|
language | bash |
---|
theme | Midnight |
---|
| $ module avail cuda
-------------------- /opt/ohpc/pub/moduledeps/gnu8-openmpi3 --------------------
cp2k-cuda/7.1.0
-------------------------- /opt/ohpc/pub/modulefiles ---------------------------
cuda11-dnn/8.0.2 cuda11-sdk/20.7 cuda11/11.0 |
|
Panel |
---|
borderColor | #9c9fb5 |
---|
bgColor | #fcfcfc |
---|
borderWidth | 2 |
---|
borderStyle | solid |
---|
|
OpenACC
The OpenACC API is a collection of compiler directives and runtime routines that allow you to specify loops and regions of code in standard C, C++, and Fortran that you can offload from a host CPU to the GPU. We provide two methods of support for OpenACC - We support OpenACC in the PGI Compiler. The PGI implementation of OpenACC is considered the best implementation.
"module load pgi" on Ocelote. If you are on a GPU node from an interactive session you can run "pgaccelinfo" to test functionality. Remember that the login nodes do not have GPUs or software installed. A useful getting-started guide written by Nvidia is available here: https://www.pgroup.com/doc/openacc17_gs.pdf
- We support OpenACC in the GCC Compiler 6.1 which is automatically loaded as a module when you log
|
in- into Ocelote. Verify with "module list".
The GCC 6 release includes a much improved implementation of the OpenACC 2.0a specification. A useful quick reference guide
|
can be found at: About two times a year we host the Xsede Workshop on Programming GPU's with OpenACC. Watch for announcements to the HPC-Info list.
Nvidia has available free online OpenACC courses:https://developer.nvidia.com/openacc/overviewhttps://developer.nvidia.com/openacc-courses
Panel |
---|
borderColor | #9c9fb5 |
---|
bgColor | #fcfcfc |
---|
borderWidth | 2 |
---|
borderStyle | solid |
---|
|
ApplicationsMany applications have been optimized to run faster on GPU's. |
These include: Application | Information | Access |
---|
NAMD |
|
- installed ; - | A restricted license version is installed |
|
on Ocelote the licensed users | $ module load vasp | GROMACS |
|
- on Ocelote; | $ module load gromacs | LAMMPS |
|
- on Ocelote; /gcc/16Mar18 - Installed on Ocelote; and available as an application through Open OnDemand | $ module load abaqus | GAUSSIAN |
|
MATLAB - | Installed as a module. See these notes. | $ module load gaussian/g16 | MATLAB | Installed as a module and available as an application through Open OnDemand. Review the GPU Coder |
|
at their web siteAMBERon their website | $ module load matlab | ANSYS Fluent | Installed as a module and available as an application through Open OnDemand | $ module load ansys | RELION | Available as a Singularity container or as a module. | $ module load relion | ML and DL |
|
frameworks - next Machine Learning
*** Nvidia Provided GPU Codes ***
Nvidia builds the popular set of ML and DL frameworks which is not a trivial task. They have made them available to us and they will be updated regularly. They are currently located at:
/unsupported/singularity/nvidia
Current list:
nvidia-caffe.18.06-py2.simg | Caffe is a deep learning framework made with expression, speed, and modularity in mind. It was originally developed by the Berkeley Vision and Learning Center (BVLC) |
nvidia-pytorch.18.06-py3.simg | PyTorch is a Python package that provides two high-level features: - Tensor computation (like numpy) with strong GPU acceleration
- Deep Neural Networks built on a tape-based autograd system
|
nvidia-mxnet.18.06.simg | MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix the flavors of symbolic programming and imperative programming to maximize efficiency and productivity. |
nvidia-tensorflow.18.06-py3.simgPython ML/DL including Nvidia RAPIDS The minimum version of Python that is supported is 3.6: Framework | Details |
---|
numba | RAPIDS: numba is for Cuda programming | cuml | RAPIDS: Cuda Machine Learning has many ML algorithms like K-means, PCA and SVM | cudf | RAPIDS: Cuda Dataframes supports loading and manipulating datasets | tensorflow | TensorFlow is an open source software library for numerical computation using data flow graphs. |
|
TensorFlow was originally developed by researchers and engineers working on the Google Brain team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and | torch | PyTorch supports tensor computation and deep neural networks |
|
research.nvidia-theano.18.06.simg | Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. |
Each is provided in a Singularity container.
The file name has a tag at the end that represents when it was made, so 18.01 is January 2018
USAGE
Copy the file you wish to use to your directory. Your home path as well as /extra and /xdisk have been bound to the image, so those are your choices.
For interactive use, start an interactive job on a GPU node modifying this command:
Code Block |
---|
$ qsub -I -N jobname -m bea -W group_list=GROUP-NAME -q windfall -l select=1:ncpus=28:mem=168gb:ngpus=1 -l cput=1:0:0 -l walltime=1:0:0 |
You must change the group_list and you should change the other attributes as desired.
On the compute node assigned to you, as an example you can run:
Code Block |
---|
$ module load singularity
$ singularity exec --nv nvidia-tensorflow.18.01-py3.simg python tensorflow_example.py |
You need to include the --nv and note it has two dashes. This will bind the Cuda libraries.
The example file is included in this directory. "tensorflow_example.py"
For batch use, you will include these two lines in your submission script
Code Block |
---|
module load singularity
singularity exec --nv nvidia-tensorflow.18.01-py3.simg python tensorflow_example.py |
There are more detailed examples here
Singularity
For more information on Singularity, see their web site at:
http://singularity.lbl.gov/user-guide
There are tutorials for Singularity on HPC here
Training
We host workshops from the Pittsburgh Supercomputer Center which is a NSF funded location. We are working with Nvidia to offer a workshop in the April 2018 timeframe.
Watch for announcements from the hpc-info list. | caffe2 | A deep learning framework | tensorrt | Inference server for deep learning | tensorboard | Visualization tool for machine learning |
|