The University of Arizona
    For questions, please open a UAService ticket and assign to the Tools Team.
Page tree
Skip to end of metadata
Go to start of metadata

Overview

El Gato was implemented at the start of 2014 and it has been reprovisioned with CentOS 7 and new compilers and libraries. It also uses PBS as the scheduler consistent with Ocelote.  El Gato is a large GPU cluster, purchased by an NSF MRI grant by researchers in Astronomy and SISTA. Whilst the Nvidia K20 GPUs are five years old, they are still valuable for single-precision workload. There are 90 nodes with one or two GPU's.

Ocelote was implemented in the middle of 2016.  It is designed to support all workloads on the standard nodes except:

  1. Large memory workloads that do not run within the 192GB RAM of each node can also be handled with the large memory node.
  2. GPU's are available as a buy-in option or windfall and are now available standard on Ocelote.

Ocelote now has 46 nodes with Nvidia P100's that are available for "standard" and "windfall" queues. See details at GPU Nodes


Free vs Buy-In

The HPC resources at UA are differentiated from many other universities in that there is central funding for a significant portion of the available resources. Each PI receives a standard monthly allocation of hours at no charge.  There is no charge to the allocation for windfall usage and that has proven to be very valuable for researchers with substantial compute requirements.  

Research groups can 'Buy-In' (add resources such as processors, memory, storage, etc.) to the base HPC systems as funding becomes available. Buy-In research groups will have highest priority on the resources they add to the system.  If the expansion resources are not fully utilized by the Buy-In group they will be made available to all users as windfall.

Details on allocations

Details on buy-in

Test Environment

HPC has a test / trial environment as well as the primary clusters detailed below.  This environment is intended to be used for projects that are six months or less in duration and cannot be run on the production systems. Reasons for not being able to be run on the production systems include requiring root access, and hardware or software requirements that cannot be met by one of the production systems. If you have a project in mind that we might be able to support, contact hpc-consult@list.arizona.edu 

FeatureDetail
Nodes16
CPUXeon Westmere-EP X5650
Dual 6-core
Memory128GB
Disk10TB (5 x 2TB)
NetworkGbE and QDR IB



Compute System Details

Name

El Gato

Ocelote


New HPC Q1 2020

Model

IBM System X iDataPlex dx360 M4


Lenovo NeXtScale nx360 M5Penguin Altus XE2242

Year Purchased

 2013

2016 (2018 P100 nodes)2019

Type

Distributed Memory

Serial, Distributed, and Large Memory

Processors

Xeon Ivy Bridge E5-2650
Dual 8-core

Xeon Haswell E5-2695 Dual 14-core
Xeon Broadwell E5-2695 Dual 14-core

AMD EPYC 7642 Dual 48-core

Processor Speed (GHz)

 2.66

2.32.4

Accelerators

137 Nvidia K20x 5 GB video mem

47 nodes with 2 K20x

43 nodes with 1 K20x

46 Nvidia P100 16 GB video mem
15 Nvidia K80 (buy-in only)

24 Nvidia V100 32 GB video mem

Node Count

 131

400192

Cores / Node

 16

28*96

Total Cores

 2160

11528**19200

Memory / Node (GB)

 64 or 256

 192
High memory node - 2TB

 512
High memory node - 3 TB

Total Memory (TB)

 26 TB

82.6 TB105 TB
/tmp~840 GB spinning
/tmp is part of root filesystem
~840 GB spinning
/tmp is part of root filesystem
~1640 GB NVMe
/tmp is part of root filesystem

Max Performance
(TFLOPS)

 46

382

OS

 Centos 7.6

 CentOS 6.10CentOS 8

Interconnect

FDR Inifinband

FDR Infiniband for node-node
10 Gb Ethernet node-storage

100 Gb/s Spine/Leaf
2x 25 Gb per compute node
Ethernet with RDMA via RoCEv2


* Ocelote includes a large memory node with 2TB of RAM available on 48 cores.  More details on the Large Memory Node

** Adjusted for the high memory node

  • No labels