The University of Arizona
    For questions, please open a UAService ticket and assign to the Tools Team.
Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »


Overview

SLURM

The new HPC system, Puma, uses SLURM as a job scheduler rather than PBS Pro. SLURM has several advantages:

  • More robust support for larger number of jobs in queue.
  • Used by national HPC groups (XSEDE and TACC) so it'll be easier to scale out to those systems.
  • Rejects jobs asking for impossible resource configurations.


Allocations and Job Queues

Using Puma with SLURM is similar to using ElGato and Ocelote with PBS. Users will still receive a monthly allocation of cpu hours associated with their PI's group which will be deducted when they run their jobs in standard. Users will also still be able to use windfall to run jobs without consuming their monthly allocations. As on Ocelote and Puma, jobs run using windfall will still be subject to preemption when resources are requested by higher-priority jobs.


Modules and Software

The process of finding, loading, and using software as modules will not change on the new system. Users will still be able to utilize the standard commands described in the Software section in our User Guide. However, in a departure from our previous systems, modules will not be available to load and utilize on the login nodes. To load, use, and test software for job submissions, users will need to request an interactive session. Interactive sessions may be requested by simply using the command "interactive". 



PBS → SLURM Rosetta Stone


In general, SLURM can translate and execute scripts written for PBS. This means that if you submit a PBS script written for Ocelote or ElGato on Puma, your script will likely run. However, there are a few caveats that should be noted:

  • There may be some PBS directives that do not directly translate to SLURM and may fail. 
  • The environment variables specific to PBS and SLURM are different. If your job relies on these environment variables, you will need to change these. Common examples are PBS_O_WORKDIR and PBS_ARRAY_INDEX

To help with the transition to SLURM, we've installed software that converts some basic PBS Pro commands into SLURM commands automatically.

Below is a comprehensive list of common PBS commands, directives, and environment variables and their SLURM counterparts. 


PBSSLURMPurpose
Job Management

qsub <options>

sbatch <options>Batch submission of jobs to run without user input
qsub -I <options>salloc <options>Request an interactive job

srun <options>Submit a job for realtime execution. Can be used to submit an interactive session.
qstatsqueueShow all jobs
qstat <jobid>squeue --job <jobid>Check status of a specific job
qstat -u <netid>squeue -u <netid>Check status of jobs specific to user
qdel <jobid>scancel <jobid>Delete a specific job
qdel -u <netid>scancel -u <netid>Delete all user jobs
qstat -QsinfoView information about nodes and queues. 
qhold <jobid>scontrol hold <jobid>Places a hold on a job to prevent it from being executed
qrls <jobid>scontrol release <jobid>Releases a hold placed on a job allowing it to be executed
Job Directives
#PBS -W group_list=group_name#SBATCH --account=group_nameSpecify group name where hours are charged
#PBS -q standard#SBATCH --partition=standardSet job queue
#PBS -l walltime=HH:MM:SS #SBATCH --time HH:MM:SSSet job walltime
#PBS -l select=N#SBATCH --nodes=NSelect N number of nodes
#PBS -l ncpus=N
Select N cpus
#PBS -l mem=<N>gb#SBATCH --mem=<N>gbSelect memory in GB
#PBS -l pcmem=
Select memory per cpu
#PBS J N-M#SBATCH --array=N-MArray job submissions where N and M are integers 
#PBS -N JobName#SBATCH --job-name=JobNameOptional: Set job name
#PBS -j oe
Optional: Combine stdout and error. This is the SLURM default
#PBS -o filename#SBATCH -o filenameOptional: Standard output filename
#PBS -e filename#SBATCH -e filenameOptional: Error filename
#PBS -v var=<value>#SBATCH --export=varOptional: Export single environment variable var to job
#PBS -V#SBATCH --export=all (default)Optional: Export all environment variables to job.
#PBS -m be#SBATCH --mail-type=BEGIN|END|FAIL|ALLOptional: Request email notifications
Environment Variables
$PBS_O_WORKDIR$SLURM_SUBMIT_DIRJob submission directory
$PBS_JOBID$SLURM_JOB_IDJob ID
$PBS_JOBNAME$SLURM_JOB_NAMEJob name
$PBS_ARRAY_INDEX$SLURM_ARRAY_TASK_IDIndex to differentiate tasks in an array
$PBS_O_HOST$SLURM_SUBMIT_HOSTHostname where job was submitted
$PBS_NODEFILE$SLURM_JOB_NODELISTList of nodes allocated to current job
Terminology
QueuePartition
Group ListAssociation
PI Account




Job Examples

Single serial job submission

PBS Script

#!/bin/bash
#PBS -N Sample_PBS_Job
#PBS -l select=1:ncpus=1
#PBS -l mem=1gb
#PBS -l walltime=00:01:00
#PBS -q windfall
#PBS -W group_list=<group_name>

cd $PBS_O_WORKDIR
pwd; hostname; date

module load python
python --version

SLURM Script

#!/bin/bash
#SBATCH --job-name=Sample_Slurm_Job
#SBATCH --ntasks=1              
#SBATCH --mem=1gb                     
#SBATCH --time=00:01:00    
#SBATCH --partition=windfall
#SBATCH --account=<group_name>    

cd $SLURM_WORK_DIR
pwd; hostname; date

module load python
python --version


Array Submission

IMPORTANT:

When submitting named jobs as arrays, SLURM will overwrite the output file with the output of the last processed job in the array. There are two ways around this:

  1. Use the option:

    #SBATCH --output=slurm-array-test-%a.out

    to differentiate each output file by subjob ID. This is the same behavior as seen in PBS

  2. Use the option:

    #SBATCH --open-mode=append

    To append the output from all job arrays to the same file


PBS Script

#!/bin/bash
#PBS -N Sample_PBS_Job
#PBS -l select=1:ncpus=1
#PBS -l mem=1gb
#PBS -l walltime=00:01:00
#PBS -q windfall
#PBS -W group_list=<group_name>
#PBS -J 1-5

cd $PBS_O_WORKDIR
pwd; hostname; date

echo "./sample_command input_file_${PBS_ARRAY_INDEX}.in"
 

SLURM Script

#!/bin/bash
#SBATCH --output=Sample_SLURM_Job-%a.out
#SBATCH --ntasks=1              
#SBATCH --mem=1gb                     
#SBATCH --time=00:01:00    
#SBATCH --partition=windfall
#SBATCH --account=<group_name>    
#SBATCH --array 1-5

cd $SLURM_WORK_DIR
pwd; hostname; date

echo "./sample_command input_file_${SLURM_ARRAY_TASK_ID}.in"


rosetta_min.pdf


  • No labels