The University of Arizona
    For questions, please open a UAService ticket and assign to the Tools Team.
Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Page Banner
imagehttps://public.confluence.arizona.edu/download/attachments/86409308/HPC-Photo.jpg?api=v2
actionTitleBatch Job Resource Request Examples
actionUrlhttps://public.confluence.arizona.edu/display/UAHPC/Running+Jobs+with+SLURM#RunningJobswithSLURM-examplerequestsNodeTypes/ExampleResourceRequests
titleCompute Resources


Panel
borderColor#9c9fb5
bgColor#fafafe
borderWidth2
borderStyledouble

Overview

El Gato

ElGato

Note

During the quarterly maintenance cycle on April 27, 2022 the ElGato K20s were removed because they are no longer supported by Nvidia.

Implemented at the start of 2014

and it

, ElGato has been reprovisioned with CentOS 7 and new compilers and libraries.

It also uses PBS as the scheduler consistent with Ocelote.  El Gato is a large GPU cluster, purchased

From July 2021 it has been using Slurm for job submission. ElGato is our smallest cluster with 130 standard nodes each with 16 CPUs. Purchased by an NSF MRI grant by researchers in Astronomy and SISTA

. Whilst the Nvidia K20 GPUs are five years old, they are still valuable for single-precision workload. There are 90 nodes with one or two GPU's

.

Ocelote

Implemented in the middle of 2016

.  It

, Ocelote is designed to support

all

the majority of workloads on the standard nodes

except:
  1. Large memory workloads that do not run within the 192GB RAM of each node can also be handled with the large memory node.
  2. GPU's are available as a buy-in option or windfall and are now available standard on Ocelote.
Will have a soft opening August 2020. Similarly to Ocelote it will have
Tip
Ocelote now has

. Additionally, Ocelote has one large memory  node with 2TB of memory and 46 nodes with Nvidia P100

's that are available for "standard" and "windfall" queues. See details at GPU Nodes

Puma

GPUs for GPU-accelerated workflows

Puma

Implemented in 2020, Puma is the biggest cat yet. Similar to Ocelote, it has standard CPU nodes (with

96

94 cores and

521

512 GB of memory per node), GPU nodes (with Nvidia V100) and

a

two high-memory

node

nodes (3 TB). Local scratch storage increased to ~1

,6

.4 TB.

The new cluster is going to run

Puma runs on CentOS 7

which will improve compatibility with the latest software.

Puma is currently different from ElGato and Ocelote in these ways:

  • Puma uses SLURM for job scheduling
  • Puma has many programs installed natively (e.g. Singularity). You won't have to module load Singularity any more!
  • There's a new alias command: interactive. This automatically places you in a single-cpu interactive session for debugging, testing, and development. 
  • Modules are no long available on the login nodes. To view and test modules, an interactive session is necessary (see above).
Free vs

.



Panel
borderColor#07105b
bgColor#fcfcfc
titleColor#fcfcfc
titleBGColor#021D61
borderStylesolid
titleContents

Table of Contents
maxLevel2




Panel
borderColor#9c9fb5
bgColor#fafafe
borderWidth2
borderStyledouble

Free vs. Buy-In

The HPC resources at

UA

UArizona are differentiated from many other universities in that there is central funding for a significant portion of the available resources. Each PI receives a standard monthly allocation of hours at no charge.

 

There is no charge to the allocation for windfall usage and that has proven to be very valuable for researchers with substantial compute requirements.  

Research groups can 'Buy-In' (

add resources such as processors, memory, storage, etc.

adding additional compute nodes) to the base HPC systems as funding becomes available. Buy-In research groups will have highest priority on the resources they add to the system.

 

If the expansion resources are not fully utilized by the Buy-In group they will be made available to all users as windfall.

Test Environment

HPC has a test / trial environment as well as the primary clusters detailed below.  This environment is intended to be used for projects that are six months or less in duration and cannot be run on the production systems. Reasons for not being able to be run on the production systems include requiring root access, and hardware or software requirements that cannot be met by one of the production systems. If you have a project in mind that we might be able to support, contact hpc-consult@list.arizona.edu 




FeatureDetailNodes16CPUXeon Westmere-EP X5650
Dual 6-coreMemory128GBDisk10TB (5 x 2TB)NetworkGbE and QDR IB

Compute System Details

Panel
borderColor#9c9fb5
bgColor#fafafe
borderWidth2
borderStyledouble

Compute System Details

Note

During the quarterly maintenance cycle on April 27, 2022 the ElGato K20s and Ocelote K80s were removed because they are no longer supported by Nvidia.


Name

El Gato

Ocelote


Puma

Model

IBM System X iDataPlex dx360 M4

Lenovo NeXtScale nx360 M5Penguin Altus XE2242

Year Purchased

 2013

2013

2016 (2018 P100 nodes)2020

Node Count

 131

131

400
192

236 CPU-only
8 GPU
2 High-memory

Total System Memory (TB)

 26 TB

26TB

82.
6 TB
6TB
105 TB
128TB

Processors

2x Xeon E5-2650v2 8-core (Ivy Bridge)

8-core

2x Xeon E5-2695v3 14-core (Haswell)
2x Xeon E5-2695v4 14-core (Broadwell)
4x Xeon E7-4850v2 12-core (Ivy Bridge)

2x AMD EPYC 7642 48-core (Rome)

Cores / Node (schedulable)

 16

28*94

16c

28c (48c - High-memory node)94c

Total Cores

 2160

2160*

11528*23616*
19200

Processor Speed

(GHz)

 2

2.

66

66GHz

2.
3
3GHz (2.
4 for
4GHz - Broadwell CPUs)2.
4
4GHz

Memory / Node

(GB)

 64 or 256

192 (High memory node - 2TB)

512 (High memory node - 3 TB)

Accelerators

137 Nvidia K20x 5 GB video mem
47 nodes with 2 K20x
43 nodes with 1 K20x

46 Nvidia P100 16 GB video mem
15 Nvidia K80 (buy-in only)

24 Nvidia V100 32 GB video mem

256GB - GPU nodes
64GB - CPU-only nodes

192GB (2TB - High-memory node)

512GB (3TB - High-memory nodes)

Accelerators


46 NVIDIA P100 (16GB)

29 NVIDIA V100S

/tmp~840 GB spinning
/tmp is part of root filesystem
~840 GB spinning
/tmp is part of root filesystem
~1640 GB
~1440 TB NVMe
/tmp
is part of root filesystem

HPL Rmax (TFlop/s)

 46

46

382

OS

 Centos

Centos 7

 CentOS
6
7CentOS 7

Interconnect

FDR Inifinband

FDR Infiniband for node-node
10 Gb Ethernet node-storage

1x 25Gb/s Ethernet RDMA (RoCEv2)
1x 25Gb/s Ethernet to storage


*
Ocelote includes a large memory node with 2TB of RAM available on 48 cores.  More details on the Large Memory Node** Adjusted for the
Includes high
memory node

Example Resource Requests

Note TypencpuspcmemMax memSample Request StatementElGatoStandard164gb62gb

#PBS -l select=1:ncpus=16:mem=62gb:pcmem=4gb

GPU11616gb250gb

#PBS -l select=1:ncpus=16:mem=250gb:ngpus=1:pcmem=16gb

OceloteStandard286gb168gb

#PBS select=1:ncpus=28:mem=168gb:pcmem=6gb

GPU2,3288gb224gb

#PBS select=1:ncpus=28:mem=224gb:np100s=1:os7=True

High Memory4842gb2016gb

#PBS -l select=1:ncpus=48:mem=2016gb:pcmem=42gb

PumaStandard945gb 470gb

#SBATCH --nodes=1
#SBATCH --ntasks=94
#SBATCH --mem=470gb

GPU4945gb 470gb

#SBATCH --nodes=1
#SBATCH --ntasks=94
#SBATCH --mem=470gb
#SBATCH --gres=gpu:1

High Memory9432gb3000gb

#SBATCH --nodes=1
#SBATCH --ntasks=94
#SBATCH --mem=3008gb

Two GPUs may be requested on ElGato with ngpus=2
There is a single node available on Ocelote with two GPUs. To request it, use np100s=2
Set os7=False for a CentOS 6 GPU node
Up to four GPUs may be requested on Puma with --gres=gpu=1, 2, 3, or 4

Image Removed

Image Removed
-memory and GPU node CPU