The University of Arizona
    For questions, please open a UAService ticket and assign to the Tools Team.
Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 78 Next »


Extensive Training Courses

We have linked to relevant training courses from other institutions.  
Rather than recreate them we recommend that you access them directly.
Here is a partial list:
Cornell Virtual Workshops

  • Introduction to Linux
  • Introduction to C Programming
  • Introduction to Fortran Programming
  • Introduction to Python
  • Introduction to R
  • MATLAB Programming
  • Introduction to GPU and CUDA
  • Parallel Computing Courses including MPI and OpenMP
  • Code Improvement
  • Data Management including Globus, HDF5 and VisIt

CyberInfrastructure Tutor from NCSA

  • Debugging Code
  • MPI
  • Introduction to Performance Tools
  • Introduction to Visualization
  • Parallel Computing

Software Carpentry

  • The Unix Shell
  • Version Control with Git
  • Using Databases and SQL
  • Programming with Python
  • Programming with R
  • Programming with MATLAB
  • Automation and Make

Linux Self Guided 

We run RHEL/CentOS 6 Linux on our high-performance systems.

If you have never used Linux before or have had very limited use, read this useful guide:

If you have learned Linux in the past but want a quick reference to the syntax of commands, then read this:

Bash Cheat Sheet

Intel® Modern Code Training

Intel brought a workshop to campus in 2014 and the material is covered here.  If you want to do any work on the Intel® Xeon Phi™ Coprocessors we have 40 of them installed in ElGato.  You can obtain "standard" queue access and can request access to the nodes with them installed. 

Created by Colfax International and Intel, and based on the book, Parallel Programming and Optimization with Intel® Xeon Phi™ Coprocessors, this short video series provides an overview of practical parallel programming and optimization with a focus on using the Intel® Many Integrated Core Architecture (Intel® MIC Architecture).

Length: 5 hours

Parallel Programming and Optimization with Intel Xeon Phi Coprocessors

Intel® Software Tools

Intel offers the Cluster Studio XE.  On Ocelote we have installed modules (module avail intel ) as:

  • intel-cluster-checker/2.2.2

  • intel-cluster-runtime/ia32/3.8

  • intel-cluster-runtime/intel64/3.8

  • intel-cluster-runtime/mic/3.8

We have installed the Intel high performance libraries (module avail intel ):

  • Intel® Threading Building Blocks
  • Intel® Integrated Performance Primitives
  • Intel® Math Kernel Library
  • Intel® Data Analytics Acceleration Library

The University is licensed and has access to this toolset separate from HPC.   Portions of it are FREE for use in teaching/instruction and to students.


Introduction to OpenMP

This PDF file is a presentation from a series called Xsede*  HPC Workshop.

* XSEDE, the Extreme Science and Engineering Discovery Environment, is the most advanced, powerful, and robust collection of integrated digital resources and services in the world. It is a single virtual system that scientists and researchers can use to interactively share computing resources, data, and expertise. XSEDE integrates the resources and services, makes them easier to use, and helps more people use them.



Singularity containers let users run applications in a Linux environment of their choosing.  This is different from Docker which is not appropriate for HPC due to security concerns.  Singularity is like a container for Docker images, but is not just for Docker.  

The most important thing to know is that you create the singularity container called an image on a workstation where you have root privileges, and then transfer the image to HPC where you can execute the image. If root authority is an issue then the answer might be a virtual environmen t on your laptop, like Vagrant for MacOS

For an overview and more detailed information refer to:

Here are some of the use cases we support using Singularity:

  • You already use Docker and want to run your jobs on HPC
  • You want to preserve your environment so that a system change will not affect your work
  • You need newer or different libraries than are offered on HPC systems
  • Someone else developed the workflow using a different version of linux
  • You prefer to use something other than Red Hat / CentOS, like Ubuntu 

Depending on your environment and the type of Singularity container you want to build, you may need to install some dependencies before installing and/or using Singularity. For instance, the following may need to be installed on Ubuntu for Singularity to build and run properly. (user input in bold)

[user@someUbuntu ~]$ sudo apt-get install build-essential debootstrap yum dh-autoreconf

On Centos, these commands will provide some needed dependencies for Singularity:

[user@someCentos ~]$ sudo yum groupinstall 'Development Tools'
[user@someCentos ~]$ sudo yum install wget
[user@someCentos ~]$ wget
[user@someCentos ~]$ sudo rpm -Uvh epel-release-7-8.noarch.rpm
[user@someCentos ~]$ sudo yum install debootstrap.noarch

You can find more information about installing Singularity on your Linux build system here. Because Singularity is being rapidly developed, we recommend downloading and installing the latest release from Github.

A limitation is that all edits to the image have to be on the originating workstation.  So if for example, a Python module needs to be added or updated, the image needs to be modified and then recopied back to HPC. 

Many of the tutorial examples here are demonstrating Tensorflow.  It was chosen because there is much interest in machine learning and our HPC systems cannot support it natively.

Binding Directories

Binding a directory to your Singularity container allows you to access files in a host system directory from within your container. By default, Singularity will bind your /home/$USER directory and your current working directory (along with a few other directories such as /tmp and /dev). The examples below include a bind to /extra.

If you need more detailed information, follow this link:

Centos with Tensorflow Example

This is an example of creating a singularity image to run code that is not supported on HPC.  This example uses Tensorflow but any application could be installed in its place.  It also uses CentOS but it could just as easily be Ubuntu.

  1. Install Singularity on linux workstation -

  2. Create the container using a size of 1500MB on a Centos workstation / VM with root privileges

    singularity create -s 1500 centosTFlow.img
    # Create an image file to host the content of the container.  
    # Think of it like creating the virtual hard drive for a VM.
    # In ext3, an actual file of specified size is created. 
  3. Create the definition file, in this example called centosTFlow.def.  Note that this line has to be changed for your actual netid: mkdir -p /extra/netid

  4. Bootstrap process creates the installation following the definition file

    singularity bootstrap centosTFlow.img centosTFlow.def

  5. Copy the new image file to your space on HPC.  /extra might be a good location as the image might use up your remaining home.  There is a line in the definition file that will create the mount for /extra.  Any time you run from a location other than /home on ElGato you are likely to see a warning which you can ignore:

    WARNING: Not mounting current directory: user bind control is disabled by system administrator
  6. Test with a simple command

    $module load singularity
    $singularity exec centosTFlow.img python --version
    Python 2.7.5
  7. Or slightly more complex create a simple python script called

    $singularity exec centosTFlow.img python /extra/netid/
    Hello World: The Python version is 2.7.5 
    $singularity shell centosTFlow.img
    Hello World: The Python version is 2.7.5 
  8. And now test tensorflow with this example from their web site,

    $singularity exec centosTFlow.img python /extra/netid/
    (0, array([-0.08299404], dtype=float32), array([ 0.59591037], dtype=float32))
    (20, array([ 0.03721666], dtype=float32), array([ 0.3361423], dtype=float32))
    (40, array([ 0.08514741], dtype=float32), array([ 0.30855015], dtype=float32))
    (60, array([ 0.09648635], dtype=float32), array([ 0.3020227], dtype=float32))
    (80, array([ 0.0991688], dtype=float32), array([ 0.30047852], dtype=float32))
    (100, array([ 0.09980337], dtype=float32), array([ 0.3001132], dtype=float32))
    (120, array([ 0.09995351], dtype=float32), array([ 0.30002677], dtype=float32))
    (140, array([ 0.09998903], dtype=float32), array([ 0.30000633], dtype=float32))
    (160, array([ 0.0999974], dtype=float32), array([ 0.3000015], dtype=float32))
    (180, array([ 0.09999938], dtype=float32), array([ 0.30000037], dtype=float32))
    (200, array([ 0.09999986], dtype=float32), array([ 0.3000001], dtype=float32)) 

Docker Examples

This example is taken from the Singularity documentation and modified for our HPC. The example taken is tensorflow again but it could be PHP or any other Docker image.  Note that you will be creating a container that is running Ubuntu on top of the Red Hat or CentOS clusters.

  1. Create the Singularity container on the workstation or VM where you have root authority:

    $singularity create --size 4000 docker-tf.img
  2. Import the Docker Tensorflow workflow from the Docker hub:

    $singularity import docker-tf.img docker://tensorflow/tensorflow:latest
    Cache folder set to /root/.singularity/docker
    Downloading layer sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
    Extracting /root/.singularity/docker/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4.tar.gz
    Downloading layer sha256:65f3587f2637c17b30887fb0d5dbfad2f10e063a72239d840b015528fd5923cd
    Extracting /root/.singularity/docker/sha256:56eb14001cebec19f2255d95e125c9f5199c9e1d97dd708e1f3ebda3d32e5da7.tar.gz
    Bootstrap initialization
    No bootstrap definition passed, updating container
    Executing Prebootstrap module
    Executing Postbootstrap module
  3. Move the image to HPC and test it:

    [user@host]$ singularity shell docker-tf.img
    Singularity: Invoking an interactive shell within container...
    Singularity.docker-tf.img> python 
    Python 2.7.6 (default, Oct 26 2016, 20:30:19) 
    [GCC 4.8.4] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import tensorflow
    >>> exit()
    Singularity.docker-tf.img> exit
    $singularity exec docker-tf.img lsb_release -a
    No LSB modules are available.
    Distributor ID:	Ubuntu
    Description:	Ubuntu 14.04.4 LTS
    Release:	14.04
    Codename:	trusty
    user@host$ singularity exec docker-tf.img python /extra/netid/ 
    WARNING:tensorflow:From in <module>.: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
    Instructions for updating:
    Use `tf.global_variables_initializer` instead.
    (0, array([ 0.72233653], dtype=float32), array([-0.00956423], dtype=float32))
    (20, array([ 0.24949318], dtype=float32), array([ 0.22735602], dtype=float32))
    (40, array([ 0.13574874], dtype=float32), array([ 0.28262845], dtype=float32))
    (60, array([ 0.10854871], dtype=float32), array([ 0.2958459], dtype=float32))
    (80, array([ 0.1020443], dtype=float32), array([ 0.29900661], dtype=float32))
    (100, array([ 0.10048886], dtype=float32), array([ 0.29976246], dtype=float32))
    (120, array([ 0.10011692], dtype=float32), array([ 0.29994321], dtype=float32))
    (140, array([ 0.10002796], dtype=float32), array([ 0.29998642], dtype=float32))
    (160, array([ 0.10000668], dtype=float32), array([ 0.29999676], dtype=float32))
    (180, array([ 0.1000016], dtype=float32), array([ 0.29999924], dtype=float32))
    (200, array([ 0.10000039], dtype=float32), array([ 0.29999983], dtype=float32))

Running Jobs

Singularity is not to be run on login nodes.  That is a general policy for any application.

To run a Singularity container image on ElGato or Ocelote interactively, you need to allocate an interactive session, and load the Singularity module. In this sample session, the Tensorflow Singularity container from above is started, and python is run. Note that in this example, you would be running the version of python that is installed within the Singularity container, not the version on the cluster.

ElGato Interactive Example

[netid@elgato singularity]$ bsub -Is bash
Job <633365> is submitted to default queue <windfall>.
<<Waiting for dispatch ...>>
<<Starting on gpu44>>

[netid@gpu44 singularity]$ module load singularity
[netid@gpu44 singularity]$ singularity exec docker-tf.img python\ /extra/chrisreidy/singularity/ 
WARNING: Not mounting current directory: user bind control is disabled by system administrator
Instructions for updating:
Use `tf.global_variables_initializer` instead.
(0, array([ 0.12366909], dtype=float32), array([ 0.3937912], dtype=float32))
(20, array([ 0.0952933], dtype=float32), array([ 0.30251619], dtype=float32))
(200, array([ 0.0999999], dtype=float32), array([ 0.30000007], dtype=float32))
[netid@gpu44 singularity]$ exit

Ocelote Interactive Example

The process is the same except that the command to initiate the interactive session will look more like:

 $ qsub -I -N jobname -m bea -M -W group_list=hpcteam -q windfall -l select=1:ncpus=28:mem=168gb -l cput=1:0:0 -l walltime=1:0:0

ElGato Job Submission

Running a job with Singularity is as easy as running other jobs.  The LSF script might look like this, and the results will be found in lsf_tf.out

#BSUB -n 1
#BSUB -q "windfall"
#BSUB -R "span[ptile=1]"
#BSUB -o lsf_tf.out
#BSUB -e lsf_tf.err
#BSUB -J testtensorflow

module load singularity
cd /extra/netid/data
singularity exec docker-tf.img python /extra/chrisreidy/singularity/

Ocelote Job Submission

The PBS script might look like this, and the results will be found in singularity-job.ojobid.

#PBS -N singularity-job
#PBS -W group_list=pi
#PBS -q windfall
#PBS -j oe
#PBS -l select=1:ncpus=1:mem=6gb
#PBS -l walltime=01:00:00
#PBS -l cput=12:00:00
module load singularity
cd /extra/chrisreidy/singularity
singularity exec docker-tf.img python /extra/chrisreidy/singularity/

Docker Without Singularity

But what if I have a collection of Docker containers and would rather have a simpler process to have them run on HPC?  I can create the singularity image without installing Singularity, and then run the image on HPC.
Let's say I have a Blast workflow set up in Docker;

$ docker pull simonalpha/ncbi-blast-docker
Using default tag: latest
latest: Pulling from simonalpha/ncbi-blast-docker

a3ed95caeb02: Pull complete 
ee48d0fb051e: Pull complete 
b92ecff543cc: Pull complete 
93701159904b: Pull complete 
Digest: sha256:89e68eabc3840b640f1037b233b5c9c81f12965b45c04c11ff935f0b34fac364
Status: Downloaded newer image for simonalpha/ncbi-blast-docker:latest

Run docker2singularity to convert the docker image. This tool will run inside a docker container.

$ docker run \
-v /var/run/docker.sock:/var/run/docker.sock \
-v $PWD:/output \
--privileged -t --rm \
singularityware/docker2singularity \

Size: 984 MB for the singularity container
(1/9) Creating an empty image...
Creating a sparse image with a maximum size of 984MiB...
Using given image size of 984
Formatting image (/sbin/mkfs.ext3)
Done. Image can be found at: /tmp/simonalpha_ncbi-blast-docker-2015-01-03-a603a76886c2.img
(2/9) Importing filesystem...
(3/9) Bootstrapping...
(4/9) Adding run script...
(5/9) Setting ENV variables...
Singularity: sexec (U=0,P=145)> Command=exec, Container=/tmp/simonalpha_ncbi-blast-docker-2015-01-03-a603a76886c2.img, CWD=/, Arg1=/bin/sh
(6/9) Adding mount points...
Singularity: sexec (U=0,P=151)> Command=exec, Container=/tmp/simonalpha_ncbi-blast-docker-2015-01-03-a603a76886c2.img, CWD=/, Arg1=/bin/sh
(7/9) Fixing permissions...
Singularity: sexec (U=0,P=157)> Command=exec, Container=/tmp/simonalpha_ncbi-blast-docker-2015-01-03-a603a76886c2.img, CWD=/, Arg1=/bin/sh
Singularity: sexec (U=0,P=196)> Command=exec, Container=/tmp/simonalpha_ncbi-blast-docker-2015-01-03-a603a76886c2.img, CWD=/, Arg1=/bin/sh
(8/9) Stopping and removing the container...
(9/9) Moving the image to the output folder...
  1,031,798,816 100%  114.23MB/s    0:00:08 (xfr#1, to-chk=0/1) 

Now you have a singularity image file which can be copied to your HPC filespace.  You can either run it from /home or from /extra.  No other choices are available at the moment. If you need to operate from /rsgrps, then you need to create your singularity image using "import" or "bootstrap" which means installing Singularity where you create the singularity image.

Sample job - extract this in your work directory:  

  • blast.tar.gz: contains query FASTA file (proteins.fasta) and BLASTable database (pdbaa).

The result of running the job below should be an output file called proteins_blastp.txt.  
Note that you have to include the full path name of the blast files even if that is your local directory because singularity presumes that you are in /home.

#PBS -N singularity-job
#PBS -W group_list=pi
#PBS -q windfall
#PBS -j oe
#PBS -l select=1:ncpus=1:mem=6gb
#PBS -l walltime=01:00:00
#PBS -l cput=12:00:00
module load singularity
cd /extra/netid/singularity
singularity exec simonalpha_ncbi-blast-docker-2015-01-03-a603a76886c2.img blastp -query /fullpath/blast/proteins.fasta -db /fullpath/blast/db/pdbaa -out /fullpath/proteins_blastp.txt

Cuda / Tensorflow Example

If you want to build a singularity container image that can run applications on ElGato GPU nodes, you must prepare your container:

  • Download the .run installer for the same NVIDIA driver that is currently running on our GPU nodes
  • Extract the contents (don't actually need to install the driver)
  • Move all of the libraries to a central location like /usr/local/nvidia and make all necessary symbolic links
  • Download and install the .run file for the same CUDA libraries that are currently running on the GPU node
  • Download, extract, and copy the correct cuDNN libraries
  • Edit and export $PATH, $LD_LIBRARY_PATH, and $CUDA_HOME to point to the correct libraries and binaries

For your convenience, the staff at the National Institute of Health maintains an installation script (called cuda4singularity) that automates this process. It has been tested with Ubuntu 16.4 and Centos 7. You can either copy or download cuda4singularity into an existing container and execute it with root privileges, or you can add the lines of code into your .def file and install the NVIDIA/CUDA libraries during the bootstrap procedure. (Note that your container will probably need ~10GB of empty space for the CUDA libraries to install properly. This is because the installer checks for minimum space requirements before running.) 

BootStrap: yum
OSVersion: 7
Include: yum wget

    # commands to be executed on host outside container during bootstrap

    # commands to be executed inside container during bootstrap

    # yum needs some tlc to work properly in container
    echo $RELEASEVER > /etc/yum/vars/releasever
    echo $ARCH > /etc/yum/vars/arch
    rpm -ivh --nodeps epel-release-7-8.noarch.rpm
    # yum -d 10 check-update  # this line caused problems in testing

    # install other needed packages
    yum -y install man which tar gzip vim-minimal perl python python-dev python-pip

    # create bind points for NIH HPC environment
    mkdir -p /extra /rsgrps

    # download and run NIH HPC cuda for singularity installer
    chmod 755 cuda4singularity
    rm cuda4singularity

    # install tensorflow
    pip install --upgrade pip
    pip install --upgrade

    # commands to be executed when the container runs

    # commands to be executed within container at close of bootstrap process

Be patient. This bootstrap procedure will take a long time to run and may look like it has stalled at several points. If you watch the output you may see CUDA issuing warnings and complaining about an incomplete installation. Don't worry. The drivers are running on the GPU nodes so they don't need to be installed within the container.

After creating a container with one of these files, you can copy it to ElGato and test it like so:

[netid@elgato]$ bsub -R gpu -Is bash
Job <633858> is submitted to default queue <windfall>.
<<Waiting for dispatch ...>>
<<Starting on gpu64>>
[netid@gpu64]$ module load singularity
[netid@gpu64]$ singularity shell cuda4tf.img 
WARNING: Not mounting current directory: user bind control is disabled by system administrator
Singularity: Invoking an interactive shell within container...

Singularity.cuda4tf.img> nvidia-smi
Mon Dec 12 14:17:55 2016       
| NVIDIA-SMI 352.39     Driver Version: 352.39         |                       
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla K20Xm         Off  | 0000:20:00.0     Off |                    0 |
| N/A   24C    P8    30W / 235W |     12MiB /  5759MiB |      0%      Default |
|   1  Tesla K20Xm         Off  | 0000:8B:00.0     Off |                    0 |
| N/A   24C    P8    30W / 235W |     12MiB /  5759MiB |      0%      Default |

| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|  No running processes found                                                 |
Singularity.cuda4tf.img> python -m tensorflow.models.image.mnist.convolutional
I tensorflow/stream_executor/] successfully opened CUDA library locally
I tensorflow/stream_executor/] successfully opened CUDA library locally
I tensorflow/stream_executor/] successfully opened CUDA library locally
I tensorflow/stream_executor/] successfully opened CUDA library locally
I tensorflow/stream_executor/] successfully opened CUDA library locally
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/core/common_runtime/gpu/] Found device 0 with properties: 
name: Tesla K20Xm
major: 3 minor: 5 memoryClockRate (GHz) 0.732
pciBusID 0000:20:00.0
Total memory: 5.62GiB
Free memory: 5.54GiB
W tensorflow/stream_executor/cuda/] creating context when one is currently active; existing: 0x4c25cb0
I tensorflow/core/common_runtime/gpu/] Found device 1 with properties: 
name: Tesla K20Xm
major: 3 minor: 5 memoryClockRate (GHz) 0.732
pciBusID 0000:8b:00.0
Total memory: 5.62GiB
Free memory: 5.54GiB
I tensorflow/core/common_runtime/gpu/] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/] DMA: 0 1 
I tensorflow/core/common_runtime/gpu/] 0:   Y N 
I tensorflow/core/common_runtime/gpu/] 1:   N Y 
I tensorflow/core/common_runtime/gpu/] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K20Xm, pci bus id: 0000:20:00.0)
I tensorflow/core/common_runtime/gpu/] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K20Xm, pci bus id: 0000:8b:00.0)
Step 0 (epoch 0.00), 205.4 ms

  • No labels