The University of Arizona
    For questions, please open a UAService ticket and assign to the Tools Team.
Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Column
width50%
Image Removed 
Column
width50%

Table of Contents
 

Disk Storage

Page Banner
imagehttps://public.confluence.arizona.edu/download/attachments/86409274/timer.jpg?api=v2
titleAllocations and Limits

Excerpt Include
Getting Help
Getting Help
nopaneltrue


Panel
borderColor#9c9fb5
bgColor#fafafe
borderStylesolid

Storage Allocations

Tip

See our Storage page for:

When you obtain a new HPC account, you will be provided with

the following storage:

storage.  The shared storage (/home, /

uxx/netid - 15GB    (backed up nightly)  
  • /extra/netid  -   200GB (no backups) Ocelote and ElGato only 
  • /tmp - some clusters have /tmp space which is on a disk in each compute node. Ocelote has about 800GB available on each node. It is often faster and more efficient to use this during your jobs and then do the final write to the shared array.
  • Additional storage:

    • /xdisk/netid  -  200GB to 1TB available on request. The time limit is 45 days with one extension. This allocation is deleted after 45 days if no extension is requested. There are directory level quotas so any files in this directory count against this quota not matter who created it.  The data is not backed up.
    • /rsgrps/netid -  rented / purchased space  (no backups)

    File count limit:
     As of this year file count limits have been imposed. The lack of limits created performance problems on the storage array for reasons that get very detailed. This means you may not be able to create files while there is space seemingly available.

    • /extra and /rsgrps are limited to 600 files / GB
    • /home and /xdisk do not have limits 
    Tip

    We strongly recommend that you do some regular housekeeping of your allocated space. Millions of files are hard to keep organized and even more difficult to migrate. Archiving or using a tool like tar will help keep our disk arrays efficient and potentially free up more space for you to use.

    xdisk

    Use this link for details on xdisk usage

    extra

    /extra is something new with Ocelote. When you log in to Ocelote for the first time, an allocation of 200GB will be created for you. It takes an hour or two to show up, and then it is permanent. Remember that it is not backed up like /home. The number of files within the 200GB is limited to 120,000.

    Tip

    uquota is the command to display how much space you have used / remaining

    Code Block
                                  used  soft limit  hard limit       files/limit
    Filesets with group access:
    /rsgrps/me                    75.45G          2T          2T      539741/1228800

    Job Limits

    Job Time Limits

     Each group is allocated a base of 24,000 hours of compute time, this allocation is refreshed monthly.  The allocation can be used on either the htc/cluster/smp clusters or on the new cluster, Ocelote., or a combination.

    Tip

    The command va will display your remaining time

    ElGato has a different allocation method of time as it is funded through an NSF MRI grant with usage time provided for campus researchers outside of the grant recipients. Reference the www.elgato.arizona.edu web site.

    Image Removed

    Image Removed

     PBS Batch Queue Limits

    The batch queues on the different systems have the following memory, time and core limits.

    groups, /xdisk) is accessible from any of the three production clusters: Puma, Ocelote and ElGato. The temporary (/tmp) space is unique to each compute node.

    LocationAllocationUsage
    Permanent Storage
    /home/uxx/netid50 GBIndividual allocations specific to each user.
    /groups/PI500 GBAllocated as a communal space to each PI and their
    group members.
    Temporary Storage
    /xdisk/PIUp to 20 TBRequested at the PI level. Available for up to 150 days
    with one 150 day extension possible for a total of 300
    days.
    /tmp~1400GB NVMe    (Puma)
    ~840GB spinning  (Ocelote)
    ~840GB spinning  (El Gato)
    Local storage specific to each compute node. Usable
    as a scratch space for compute jobs. Not accessible 
    once jobs end. 




    Panel
    borderColor#9c9fb5
    bgColor#fafafe
    titleColor#fcfcfc
    titleBGColor#021D61
    borderStylesolid
    titleContents

    Table of Contents
    maxLevel2




    Panel
    borderColor#9c9fb5
    bgColor#fafafe
    borderStylesolid

    Job Allocations

    All University of Arizona Principal Investigators (PIs; aka Faculty) that register for access to the UA High Performance Computing (HPC) receive these free allocations on the HPC machines which is shared among all members of their team. Currently all PIs receive:

    HPC MachineStandard Allocation Time per Month per PIWindfall
    Puma100,000 CPU Hours per monthUnlimited but can be pre-empted
    Ocelote70,000 CPU Hours per monthUnlimited but can be pre-empted
    El Gato7,000 CPU Hours per monthUnlimited but can be pre-empted

    Best practices

    1. Use your standard allocation first! The standard allocation is guaranteed time on the HPC. It refreshes monthly and does not accrue (if a month's allocation isn't used it is lost).
    2. Use the windfall queue when your standard allocation is exhausted. Windfall provides unlimited CPU-hours, but jobs in this queue can be stopped and restarted (pre-empted) by standard jobs.
    3. If your group consistently needs more time than the free allocations, consider the HPC buy-in program.
    4. Last resort for tight deadlines: PIs can request a special project allocation once per year (https://portal.hpc.arizona.edu/portal/; under the Support tab). Requesting a special project will provide qualified hours which are effectively the same as standard hours.
    5. For several reasons we do not offer checkpointing.  It may be desirable to have this capability in your code.

    How to Find Your Remaining Allocation

    To view your remaining allocation, use the command va in a terminal.

    You can use this time on either the standard nodes which do not require special attributes in the scheduler script, or on the GPU nodes which do require special attributes.

     SLURM Batch Queues

    The batch queues, also known as partitions, on the different systems are the following:

    QueueDescription
    standardUsed to consume the monthly allocation of hours provided to each group
    windfallUsed when standard is depleted but subject to preemption
    high_priorityUsed by 'buy-in' users for purchased nodes

     System 

     Queue Name

     # of Compute Nodes

     Max Wallclock Hrs / Job

     Largest job

    Total cores in use / group

     Largest job / memory

     Max # of Running Jobs

    Max Queued JobsOcelotestandard3312401344**
    cores 

    1344**

    8064GB5003000windfall3312401344**
    cores 8064GB5003000high_pri367201344**
    cores 8064GB5005000

     Cluster (8400)

     standard

     124

     240

     504 cores

    1008 (user limit) ***
     

     1008 GB

     512

    500

     

     windfall

     229 (all)

     240

     256 cores

    512 (user limit)

     512 GB

     512

     

     cluster_high

     105 (buy-in)

     720

     512 cores

    buy-in

     1024 GB

     1148

     

     

     

     

     

     

     

     SMP (UV1000)

     standard

     

     240

     256 cores

    512***

     512 GB

     512

    500

     

     windfall

     

     240

     256 cores

     512 GB

     256

     

     smp_high

     

     720

     512 cores

     1280 GB

    240

     

     

     

     

     

     

     

    HTC 

     standard

     104

     720

     256 cores

    512***

     512 GB

     512

    500

     

     windfall

     104

     720

     256 cores

     512 GB

     512

     

     htc_high

     10

     720

     512 cores

     512 GB

     240

    **  This represents 48 physical nodes, and 9.2TB of memory
    ***  This limit is shared by all members of a group across all queues. So you can use the system 1000 core limit by one user on the standard queue or share it across multiple users or queues. 
    qualifiedUsed by groups who have a temporary special project allocation





    Panel
    borderColor#9c9fb5
    bgColor#fafafe
    borderStylesolid

    Job Limits

    To check group, user, and job limitations on resource usage, use the command job-limits $YOUR_GROUP in the terminal.




    Panel
    borderColor#9c9fb5
    bgColor#fafafe
    borderStylesolid

    Special Allocations

    Sometimes you may need an extra allocation for a conference deadline or paper submission.  Or something else.  We can offer a temporary allocation according to the guidelines here: Special Projects