OverviewAll three clusters, Puma, Ocelote, and ElGato, use SLURM for resource management and job scheduling. Additional SLURM Resources and Examples
|
Node SummaryBefore submitting a Slurm script, you must know (or at least have a general idea) of the resources needed for your job. This will tell you which type of node to request, how much memory, and other useful information that can be provided to the system via your batch script. A detailed list of slurm batch flags are included below. General Overview
Hardware Limitations by Node Type and ClusterPlease consult the following table when crafting Slurm submission scripts. Requesting resources greater than what are available on a given cluster+node may lead to errors or delays.
See here for example Slurm requests. Other Job LimitsIn addition to fitting your jobs within the constraints of our hardware, there are other limitations imposed by the scheduler to maintain fair use.
|
SLURM and System Commands
|
|
Command | Purpose |
---|---|
#SBATCH --account=group_name | Specify the account where hours are charged. Don't know your group name? Run the command "va" to see which groups you belong to |
#SBATCH --partition=partition_name | Set the job partition. This determines your job's priority and the hours charged. See Job Partition Requests below for additional information |
#SBATCH --time=DD-HH:MM:SS | Set the job's runtime limit in days, hours, minutes, and seconds. A single job cannot exceed 10 days or 240 hours. |
#SBATCH --nodes=N | Allocate N nodes to your job. For non-MPI enabled jobs, this should be set to "–-nodes=1" to ensure access to all requested resources and prevent memory errors. |
#SBATCH --ntasks=N | ntasks specifies the number of tasks (or processes) the job will run. For MPI jobs, this is the number of MPI processes. Most of the time, you can use ntasks to specify the number of CPUs your job needs. However, in some odd cases you might run into issues. For example, see: Using Matlab By default, you will be allocated one CPU/task. This can be increased by including the additional directive --cpus-per-task. The number of CPUs a job is allocated is cpus/task * ntasks, or M*N |
#SBATCH --cpus-per-task=M | |
#SBATCH --mem=Ngb | Select N gb of memory per node. If "gb" is not included, this value defaults to MB. Directives --mem and --mem-per-cpu are mutually exclusive. |
#SBATCH --mem-per-cpu=Ngb | Select N GB of memory per CPU. Valid values can be found in the Node Types/Example Resource Requests section below. If "gb" is not included, this value defaults to MB. |
#SBATCH --gres=gpu:N | Optional: Request N GPUs. |
#SBATCH --gres=gpu:ampere:N | Optional: Request N A100 GPUs. |
#SBATCH --gres=gpu:volta:N | Optional: Request N V100s GPUs. |
#SBATCH --constraint=hi_mem | Optional: Request a high memory node (Ocelote and Puma only). |
#SBATCH --array=N-M | Submits an array job from indices N to M |
#SBATCH --job-name=JobName | Optional: Specify a name for your job. This will not automatically affect the output filename. |
#SBATCH -e output_filename.err #SBATCH -o output_filename.out | Optional: Specify output filename(s). If -e is missing, stdout and stderr will be combined. |
#SBATCH --open-mode=append | Optional: Append your job's output to the specified output filename(s). |
#SBATCH --mail-type=BEGIN|END|FAIL|ALL | Optional: Request email notifications. Beware of mail bombing yourself. |
#SBATCH --mail-user=email@address.xyz | Optional: Specify email address. If this is missing, notifications will go to your UArizona email address by default. |
#SBATCH --exclusive | Optional: Request exclusive access to node. |
#SBATCH --export=VAR | Optional: Export a comma-delimited list of environment variables to a job. |
#SBATCH --export=all (default) | Optional: Export your working environment to your job. |
#SBATCH --export=none | Optional: Do not export working environment to your job. |
SLURM Environment Variables
|
SLURM Reason CodesSometimes, if you check a pending job using squeue, there are some messages that show up under Reason indicating why your job may not be running. Some of these codes are non-intuitive so a human-readable translation is provided below:
|
|
Partition | SLURM | Details |
---|---|---|
standard | #SBATCH --account=<PI GROUP> #SBATCH --partition=standard | Consumes your group's standard allocation. These jobs cannot be interrupted. |
windfall | #SBATCH --partition=windfall | Does not consume your group's standard allocation. Jobs may be interrupted and restarted by higher-priority jobs. The --account flag needs to be omitted or an error will occur. |
high_priority | #SBATCH --account=<PI GROUP> #SBATCH --partition=high_priority #SBATCH --qos=user_qos_<PI GROUP> | Available for groups who have purchased compute resources. |
qualified | #SBATCH --account=<PI GROUP> #SBATCH --partition=standard #SBATCH --qos=qual_qos_<PI GROUP> | Available for groups that have submitted a special project request. |
SLURM Output Filename PatternsSLURM offers ways to make your job's output filenames customizable through the use of character replacements. A table is provided below as a guide with some examples. Variables may be used or combined as desired. Note: character replacements may also be used with other SBATCH directives such as error filename, input filename, and job name.
|
|
Cluster | Max CPUs | Mem/CPU | Max Mem | Sample Request Statement |
---|---|---|---|---|
ElGato | 16 | 4gb | 62gb | #SBATCH --nodes=1 |
Ocelote | 28 | 6gb | 168gb |
|
Puma | 94 | 5gb | 470gb | #SBATCH --nodes=1 #SBATCH --ntasks=94 #SBATCH --mem-per-cpu=5gb |
During the quarterly maintenance cycle on April 27, 2022 the ElGato K20s and Ocelote K80s were removed because they are no longer supported by Nvidia. |
GPU jobs are requested using the generic resource, or --gres
, SLURM directive. In general, the directive to request N GPUs will be of the form: --gres=gpu:N
Cluster | Max CPUs | Mem/CPU | Max Mem | Sample Request Statement |
---|---|---|---|---|
Ocelote | 28 | 8gb | 224gb |
|
Puma1 | 94 | 5gb | 470gb |
|
1 Up to four GPUs may be requested on Puma on a single GPU node with --gres=gpu:1, 2, 3, or 4 |
When requesting a high memory node, include both the memory/CPU and constraint directives
Cluster | Max CPUs | Mem/CPU | Max Mem | Sample Request Statement |
---|---|---|---|---|
Ocelote | 48 | 41gb | 2015gb |
|
Puma | 94 | 32gb | 3000gb | #SBATCH --nodes=1 #SBATCH --ntasks=94 #SBATCH --mem-per-cpu=32gb #SBATCH --constraint=hi_mem |
Total Job Memory vs. CPU Count
Job Memory and CPU Count are CorrelatedThe memory your job is allocated is dependent on the number of CPUs you request. For example, on Puma standard nodes, you get 5G for each CPU you request. This means a standard job using 4 CPUs gets 5G/CPU × 4 CPUs = 20G of total memory. Each node has its own memory ratio that's dependent on its total memory ÷ total number of CPUs. A reference for all the node types, the memory ratios, and how to request each can be found in the Node Types/Example Resource Requests section above. What Happens if My Memory and CPU Requests Don't Match?Our systems are configured to try to help when your memory request does not match your CPU count. For example, if you request 1 CPU and 470G of memory on Puma, the system will automatically scale up your CPU count to 94 to ensure that you get your full memory requirements. This does not go the other way, so if you request less memory than would be provided by your CPU count, no adjustments are made. If you omit the --memory flag entirely, the system will use the memory ratio for the standard nodes on that cluster. Possible Problems You Might Encounter
|
|
Want your session to start faster? Try one or both of the following:
|
When you are on a login node, you can request an interactive session on a compute node. This is useful for checking available modules, testing submission scripts, compiling software, and running programs directly from the command line. We have a built-in shortcut command that will allow you to quickly and easily request a session by simply entering: interactive
When you request a session, the full salloc command being executed will be displayed for verification/copying/editing/pasting purposes. For example:
(ocelote) [netid@junonia ~]$ interactive Run "interactive -h for help customizing interactive use" Submitting with /usr/local/bin/salloc --job-name=interactive --mem-per-cpu=4GB --nodes=1 --ntasks=1 --time=01:00:00 --account=windfall --partition=windfall salloc: Pending job allocation 531843 salloc: job 531843 queued and waiting for resources salloc: job 531843 has been allocated resources salloc: Granted job allocation 531843 salloc: Waiting for resource configuration salloc: Nodes i16n1 are ready for job [netid@i16n1 ~]$ |
Notice in the example above how the command prompt changes once your session starts. When you're on a login node, your prompt will show "junonia" or "wentletrap". Once you're in an interactive session, you'll see the name of the compute node you're connected to.
If no options are supplied to the command interactive
, your job will automatically run using the windfall partition for one hour using one CPU. To use the standard partition, include the flag "-a" followed by your group's name. To see all the customization options:
(ocelote) [netid@junonia ~]$ interactive -h Usage: /usr/local/bin/interactive [-x] [-g] [-N nodes] [-m memory per core] [-n ncpus per node] [-Q optional qos] [-t hh::mm:ss] [-a account to charge] |
You may also create your own salloc
commands using any desired SLURM directives for maximum customization.
MPI JobsOpenMPIFor openmpi the important variables are set by default, so you do not need to include them in your scripts.
Intel MPIFor Intel MPI, these variables are set for you:
If you're using Intel MPI with mpirun and are getting errors, try replacing
|
Parallel WorkTo make proper use of a supercomputer, you will likely want to use the benefit of many cores. Puma has 94 cores in each node available to Slurm. The exception to that is running hundreds or thousands of jobs using High Throughput Computing. We have a training course which explains the concepts and terminology of parallel computing with some examples. Introduction to Parallel Computing This practical course in Parallel Analysis in R is also useful |