The University of Arizona
    For questions, please open a UAService ticket and assign to the Tools Team.
Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Edited Ocelote GPU info
Column
width30%
Image Removed
Column
width70%
toc

Page Banner
imagehttps://public.confluence.arizona.edu/download/attachments/86409225/gpu.jpg?api=v2
titleGPU Nodes

Excerpt Include
Getting Help
Getting Help
nopaneltrue


Panel
borderColor#9c9fb5
bgColor#fcfcfc
borderWidth2
borderStylesolid

Overview

Compute Resources

More detailed information on system resources can be found on our Compute Resources page.

Containers with GPU Support

Singularity containers are available as modules on HPC for GPU-supported workflows. For more information, see our documentation on Containers.

Accessing GPUs

Information on how to request GPUs using SLURM can be found in our SLURM Documentation.

Training

For a list of training resources related to GPU workflows, see our Training documentation.



Panel
borderColor#9c9fb5
bgColor#fcfcfc
titleColor#fcfcfc
titleBGColor#021D61
borderStylesolid
titleContents

Table of Contents
maxLevel1




Panel
borderColor#9c9fb5
bgColor#fcfcfc
borderWidth2
borderStylesolid

Cluster Information


Deck of Cards
startHiddenfalse
idcluster information


Card
defaulttrue
idpuma
labelPuma
titlePuma

Puma

Puma has a different arrangement for GPU nodes than Ocelote and ElGato. Whereas the older clusters have one GPU per node, Puma has four. This has a financial advantage for providing GPU's with lower overall cost, and a technical advantage of allowing jobs that can use multiple GPU's to run faster than spanning multiple nodes.  This capability comes from using a newer operating system.  
Each node has four Nvidia V100S model GPUs. They are provisioned with 32GB memory compared to 16GB on the P100's.

Image Modified


Card
idocelote
labelOcelote
titleOcelote

Ocelote

Ocelote has

45

46 compute nodes with Nvidia P100 GPUs that are available to researchers on campus. The limitation is a maximum of 10 concurrent jobs.

O

Previously, one node with a V100

is also available. Since there is only one, you can feel free to use it for testing and comparisons to the P100, but production work should be run on the P100's. There is also one node with two P100's for testing jobs that use two GPU's. This one should be used to compare with running a job on two nodes.  

Image Removed

ElGato

ElGato has 90 nodes with one or two Nvidia K20 GPU's.  Whilst they are older, well quite old actually, they still support the latest version of Cuda and are useful for testing. Especially when the newer GPU's are busy

was available, but it has since been replaced with a P100. Tasks which require multiple GPUs must either request multiple nodes on Ocelote, or use Puma's GPU nodes.

Image Added






Panel
borderColor#9c9fb5
bgColor#fcfcfc
borderWidth2
borderStylesolid

Cuda Modules

Warning

Nvidia Nsight Compute (the interactive kernel profiler) is not available. In response to a security alert (CVE-2018-6260) this capability is only available with root authority which users do not have. 

The latest Cuda module available on the system is 11.0 and is the only version until newer ones come along. The Cuda driver version can be queried with the nvidia-smi command. To see the modules available, in an interactive session simply run:

Code Block
languagebash
themeMidnight
$ module avail cuda

-------------------- /opt/ohpc/pub/moduledeps/gnu8-openmpi3 --------------------
   cp2k-cuda/7.1.0

-------------------------- /opt/ohpc/pub/modulefiles ---------------------------
   cuda11-dnn/8.0.2    cuda11-sdk/20.7    cuda11/11.0





Panel
borderColor#9c9fb5
bgColor#fcfcfc
borderWidth2
borderStylesolid

OpenACC

The OpenACC API is a collection of compiler directives and runtime routines that allow you to specify loops and regions of code in standard C, C++, and Fortran that you can offload from a host CPU to the GPU.

We provide two methods of support for OpenACC

  1. We support OpenACC in the PGI Compiler.  The PGI implementation of OpenACC is considered the best implementation.  
    "module load pgi" on Ocelote. If you are on a GPU node from an interactive session you can run "pgaccelinfo" to test functionality.  Remember that the login nodes do not have GPUs or software installed.  
    A useful getting-started guide written by Nvidia is available here: https://www.pgroup.com/doc/openacc17_gs.pdf 

  2. We support OpenACC in the GCC Compiler 6.1 which is automatically loaded as a module when you log into Ocelote.  Verify with "module list".
    The GCC 6 release includes a much improved implementation of the OpenACC 2.0a specification.
    A useful quick reference guide is available from: https://gcc.gnu.org/wiki/OpenACC#Quick_Reference_Guide




Panel
borderColor#9c9fb5
bgColor#fcfcfc
borderWidth2
borderStylesolid

Applications

Many applications have been optimized to run faster on GPU's. These include:

ApplicationInformationAccess
NAMDInstalled as a module$ module load namd
VASPA restricted license version is installed; only available to licensed users$ module load vasp
GROMACSInstalled as a module$ module load gromacs
LAMMPSInstalled as a module$ module load lammps
ABAQUSInstalled as a module and available as an application through Open OnDemand$ module load abaqus
GAUSSIANInstalled as a module. See these notes.$ module load gaussian/g16
MATLABInstalled as a module and available as an application through Open OnDemand. Review the GPU Coder on their website$ module load matlab
ANSYS FluentInstalled as a module and available as an application through Open OnDemand$ module load ansys
RELIONAvailable as a Singularity container or as a module.$ module load relion
ML and DL FrameworksSee the section below.

Python ML/DL including Nvidia RAPIDS 

Nvidia Rapids is only available on Ocelote currently.
Tensorflow and tensorboard are available on both Puma (python/3.8) and Ocelote (ocelote/3.6)  

The minimum version of Python that is supported is 3.6

so you will module load python/3.6 on Ocelote, for all these functions. This will get you to installed packages including

:

FrameworkDetails
numbaRAPIDS: numba is for Cuda programming
cumlRAPIDS: Cuda Machine Learning has many ML algorithms like K-means, PCA and SVM
cudfRAPIDS: Cuda Dataframes supports loading and manipulating datasets
tensorflowTensorFlow is an open source software library for numerical computation using data flow graphs.
torchPyTorch supports tensor computation and deep neural networks
caffe2A deep learning framework
tensorrtInference server for deep learning
tensorboardVisualization tool for machine learning