The University of Arizona
    For questions, please open a UAService ticket and assign to the Tools Team.
Page tree
Skip to end of metadata
Go to start of metadata




Overview

The bastion host has limited storage capacity and is not intended for file transfers.

To make transfers to/from HPC, you will need to have logged into your account at least once. If you have not, you may encounter "directory does not exist" errors. This is because your home directory is not created until you log in for the first time. To access the system, see: System Access


Files are transferred to shared data storage and not to the bastion node, login nodes, or compute nodes. Note that because storage is not cluster-specific, your files are accessible on both ocelote and elgato.

Data Transfers by Size

  1. Small Transfers: For small data transfers the web portal offers the most intuitive method.
  2. Transfers <100GB: we recommend sftp, scp or rsync using filexfer.hpc.arizona.edu.  
  3. Transfers (>100GB), transfers outside the university, and large transfers within HPC: we recommend using Globus (GridFTP).








General File Transfers


GridFTP / Globus

Overview

GridFTP is an extension of the standard File Transfer Protocol (FTP) for high-speed, reliable, and secure data transfer. Because GridFTP provides a more reliable and high performance file transfer (compared to protocols such as SCP or rsync), it enables the transmission of very large files. GridFTP also addresses the problem of incompatibility between storage and access systems. (You can read more about the advantages of GridFTP here.)

To use GridFTP, one method that the UA has compatibility with is Globus. Globus uses endpoints to make transfers. 

Personal Endpoint

Setting up a personal endpoint will allow you to make transfers from your computer to any other Globus endpoint. To do this:

  1. Go to https://www.globus.org/ and click Log In in the top right corner.
  2. In the Use your existing organizational login box, type in or find The University of Arizona and hit Continue.
  3. This will take you to Webauth. Log in as normal.
  4. You will end up at the Globus File Manager web interface.
  5. Choose Endpoints on the left and select Create New Endpoint on the top right. From there, download Globus Connect Personal for your operating system. 
  6. Type a descriptive name for your local computer under Provide label for future reference and click Allow.
  7. Under Collection Details, use your UArizona email under Owner Identity (this should be the default) and enter a descriptive Collection Name.
  8. You should now be able to find your collection on Globus under EndpointsAdministered by You.

HPC Endpoint

All you'll need to make transfers to HPC is the endpoint name: arizona#sdmz-dtn. To see information about this endpoint, log into the Globus Web Page and under Endpoints, search the name:

Making Transfers

To make transfers between endpoints, go to https://app.globus.org/file-manager. You should see a pretty classic “commander-style" file transfer view. You’ll pick an endpoint for each side under the Collection field at the top. For example, to transfer files between HPC and your personal computer, click one Collection field then find and select your personal endpoint under Collections:

Next, click the other Collection field in the File Manager and enter the HPC endpoint arizona#sdmz-dtn. You should now be able to see the file systems of both machines on either side of the File Manager page. To make a transfer, select a file or directory from one of the panels and click the arrow at the bottom of the screen to initiate the transfer.

To monitor your transfer's progress, you can check under the Activity page (left vertical menu).


sftp

The intent is that filexfer.hpc.arizona.edu is to be used for most file transfers.

sftp encrypts data before it is sent across the network.  Additional capabilities include resuming interrupted transfers, directory listings, and remote file removal. To transfer files with sftp:

  1. Open a SSH v2 compliant terminal client and navigate do a desired working directory on your local machine.
  2. Log into HPC:

    $ sftp NetId@filexfer.hpc.arizona.edu
  3. Use the command get to transfer files from HPC to your local machine:

    sftp> get /path/to/remote/file /path/to/local/directory
  4. Use the command put to transfer files from your local machine to HPC:

    sftp> put /path/to/remote/file /path/to/local/directory
  5. For additional options/help, use help. To exit, use bye

ftp / lftp

HPC uses the ftp client lftp to transfer files between the file transfer node and remote machines. This can be done by following the steps outlined below:

Due to security risks, it is not possible to ftp to the file transfer node from a remote machine, however, you may ftp from the file transfer node to a remote machine.


  1. Connect to the data transfer node:

    $ ssh username@filexfer.hpc.arizona.edu
  2. Connect to the external host using the command lftp:

    $ lftp ftp.hostname.gov
  3. Use the command get to transfer files from HPC to the remote machine

    sftp> get /path/to/remote/file /path/to/local/directory
  4. Use the command put to transfer files from the remote machine to HPC

    sftp> put /path/to/remote/file /path/to/local/directory
  5. For complete documentation on lftp usage:

    $ man lftp



scp

scp uses Secure Shell (SSH) for data transfer and utilizes the same mechanisms for authentication, thereby ensuring the authenticity and confidentiality of the data in transit.

Mac/Linux

You will need to use an SSH v2 compliant terminal to move files to/from HPC. For more information on using scp, use man scp.

Moving a File or Directory to the HPC:

In your terminal, navigate to the desired working directory on your local machine (laptop or desktop usually). To move a file or directory to a designated subdirectory in your account on HPC:

$ scp -rp filenameordirectory NetId@filexfer.hpc.arizona.edu:subdirectory 

Getting a File or Directory From the HPC:

In your terminal, navigate to the desired working directory on your local machine. The copy a remote file from HPC to your current directory:

$ scp -rp NetId@filexfer.hpc.arizona.edu:filenameordirectory .

 ** the space folllowed by a period at the end means the destination is the current directory** 

Wildcards

Wildcards can be used for multiple file transfers (e.g. all files with .dat extension). Note the backslash " \ " preceding *

$ scp NetId@filexfer.hpc.arizona.edu: subdirectory /\*. dat

Windows Users

Windows users can use software like WinSCP to make scp transfers. To use WinSCP, first download/install the software from: https://winscp.net/eng/download.php

To connect, enter filexfer.hpc.arizona.edu in the Host Name field, enter your NetID under User name, and enter your password. Accept by clicking Login. You'll be prompted to Duo Authenticate



rsync

rsync is a fast and extraordinarily versatile file copying tool.  It synchronizes files and directories between two different locations (or servers). Rsync copies only the differences of files that have actually changed.  

An important feature of rsync not found in most similar programs/protocols is that the mirroring takes place with only one transmission in each direction. Rsync can copy or display directory contents and copy files, optionally using compression and recursion. 

You use  rsync in the same way you use scp. You must specify a source and a destination, one of which may be remote. 

Example1: 

Recursively transfers all files from the directory src/directory-name on the machine computer-name into the /data/tmp/directory-name directory on the local machine. The files are transferred in archive mode, which ensures that symbolic  links, devices, attributes, permissions, ownerships, etc. are preserved in the transfer.  Additionally, compression will be used to reduce the size of data portions of the transfer. 

$ rsync -avz  computer-name:src/directory-name  user@remote.host:/data/tmp --log-file=hpc-user-rsync.log  

Example 2:

rsync -avz  computer-name:src/directory-name/  user@remote.host:/data/tmp --log-file=hpc-user-rsync.log 

A trailing slash on the source changes this behavior to avoid creating an additional directory level at the destination. You can think of a trailing / on a source as meaning “copy the contents of this directory” as opposed to “copy the directory by name”, but in both cases the attributes of the containing directory are transferred to the containing directory on the destination. 

Additional Options:

FlagMeaning
-aArchive mode; will preserve timestamps
-vIncrease verbosity
-z Compress file data during the transfer
--log-fileLog everything done in specified FILE



iRODS


The Research Computing test iRODS instance has been dismantled.    iRODs servers are available elsewhere (like CyVerse).


There are two ways to iRODS - either by command line or using a GUI like Cyberduck on your workstation.

Command line

Note that iCommands cannot be used to upload files into Data Store via URL from other sites (ftp, http, etc.).

To transfer data from an external site, you first must download the file to a local machine using wget or a similar mechanism, and then use iput to upload it to the Data Store. 

On Ocelote, iRODS 4 is installed as a standard package to the operating system on every node. This means you will not have to "module load irods". You will still need to "iinit" the first time (see below). iRODS is also available on the filexfer node for use.

Initializing iRODS

Running iinit for any system using iRODS 4.x, unlike its iRODS3 counterpart, does not help you set up the environment. Instead, you need to run create_irods_env with suitable options for the iRODS host, zone, username,etc manually.

For this key:Enter this:
-h<hostname of iRODS server>
-p<port number of iRODS server> (1247 is default)
-z<Zone name of iRODS zone>
-u<user name on the iRODS server> (may not match your netid)
-a<authentication method for the iRODS server> (PAM, native,...)

For example:

$ create_irods_env -a native -h someserver.somewhere.net -z MYZONE

will suffice to create an appropriate ~/.irods/irods_environment.json file to allow you to run iinit; we took the default -p 1247, -u <your NetId> in the above example by omitting -p and -u.  You only need to do this step ONE time; subsequent times you will just run iinit and it will asked for your password. Note create_irods_env will NOT overwrite or alter an existing ~/.irods/irods_environment.json file.

Once the  ~/.irods/irods_environment.json file is created properly, you should be able to sign in to the iRods server your selected using iinit, viz:

$ iinit	
Enter your current ... password:	# enter your iRODS server password here

At this point you can use other iRods commands such as icp to move files.

Commands

CommandDescription
icd

Changes working directory

ichmod

For help, enter ichmod -h.

ichmod read

Grant read-only permission level for specified user to selected file or folder.

ichmod write

Grant read and write permission level for specified user to selected file or folder.

ichmod own

Grant full ownership permission level for specified user to selected file or folder

ichmod null

Remove permission level for the user to the file or folder

iexit

Log off/disconnect from the Data Store.

iget

Download file/directory from iRODS to local device

iinit

Initialize and start the connection to iRODS

ils

Lists contents of current working directory. For help, enter ils -h

ils -ALists directory permissions

imkdir

Creates new directory 

iput

Uploads file/directory from local device to iRODS

ipwd

Shows name and path of current remote folder

irm

Moves a file to the trash

irm -f

Deletes a file.

irm -r

Moves a folder to the trash.

irm -fr

 

Deletes a folder.

Examples

In the following examples:

  • my-files-to-transfer/ is the example name of the directory or folder for bulk transfers.
  • my-file-to-transfer.txt is the example name for single file transfers.
  • Any filename may be used for the checkpoint-file.

Bulk Files Transfer Example

iput -P -b -r -T --retries 3 -X checkpoint-file my-files-to-transfer/


Single Large File Transfer Example

iput -P -T --retries 3 --lfrestart checkpoint-lf-file my-file-to-transfer.txt


Graphical / Cyberduck

Cyberduck is an open source cross-platform, high-throughput and parallel data transfer transfer program that supports multiple transfer protocols (FTP, SFTP, WebDAV, Cloud files, Amazon S3, etc.). It serves as an alternative to the iDrop Java applet, and has been extensively tested with large data transfers (60-70 GB). This allows users to transfer large files, depending on the user's available bandwidth and network settings.

Cyberduck versions are available for Mac OS (10.6 and higher on Intel 64-bit) and Windows (Windows XP, Windows Vista, Windows 7, or Windows 8). LINUX users should use iDrop Desktop or iCommands. Cyberduck version 4.7.1 (released July 7, 2015) and later supports the iRODS protocol.

To use Cyberduck with iRODS:

  1. Install or Update Cyberduck. If Cyberduck has already been installed, you may update it under the dropdown Cyberduck → Check for Updates. If you need to install Cyberduck for the first time, go to https://cyberduck.io/, download the installer that is appropriate for your operating system and install. See Cyberduck Preferences for more information on installation.
  2. Configure Cyberduck for use with iRODS:
    1. Open Cyberduck and click Open Connection
    2. In the first drop down field, enter a profile name
    3. Create the connection entering your iRODS server under Server1247 under Port, and your username under Username.
  3. Transfer Files by opening another connection and dragging/dropping files.








Google Drive

For medium to large file transfers, the command-line software listed below should be used on the filexfer node and not the login nodes.


Globus

For instructions on setting up Globus Personal Connect and connecting to HPC, see Grid FTP/Globus under General File Transfers
We have added a permanent Google Drive endpoint allowing users to access their UArizona-affiliated Google Drive account. To connect:

  1. Search for UA Google Drive and select the matching result.



  2. Under the Collections tab, you will be asked to authenticate, select Continue.



  3. Select your university email address



  4. Select Allow to give Globus permission to access your Google Drive



  5. You will be redirected back to Endpoints. Under the Collections tab, select Add a Guest Collection



  6. Give your collection a descriptive name and select a default directory for Globus to access. Some options include (from Globus' documentation):

    - /My Drive : Files owned by the user’s Google account that are located in the user’s root directory. This is treated the user’s home directory on collections created using the Google Drive connector.
    - /Shared With Me : Files and directories owned by others which have been shared with the user’s Google account.
    /Starred : Files and directories to which the user’s Google account has added the starred attribute.
    - /Team Drives : Directories which are Google Shared Drives (formerly called Team Drives) which the user’s Google account has been granted access to.
    - /Trash : Files and directories which the user’s Google account has deleted.

    Then select Create Collection.

  7. You should now be able to make transfers to/from Google Drive with Globus. To get started, go to File Manager and click Search next to Collection at the top of the page. Find your Google Drive under the Your Collections tab and select it.



  8. This will redirect you back to File Manager where you can see the contents of your Google Drive and make file transfers.


rclone

Rclone is a CLI installed on filexfer.hpc.arizona.edu that can be used to transfer files to Google Drive as well as other Cloud-based storage sites. To use rclone, you will need to start by configuring it. The walkthrough provided by rclone is fairly straightforward. An example is provided below omitting some instructions for clarity:

$ rclone config
Name                 Type
====                 ====

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> n
name> GoogleDrive
Type of storage to configure.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
...
12 / Google Drive
   \ "drive"
...
Storage> 12
Google Application Client Id
client_id> 
client_secret> 
scope> 1
root_folder_id> 
service_account_file> 
Edit advanced config? (y/n)
y) Yes
n) No
y/n> n

The next prompt will ask if you would like to use auto config, select N here or the configuration will not be successful. You will be given a URL. Copy and paste this into your web browser and follow the prompts to allow rclone to have access to your UArizona Google Drive account. When you are done, it will give you a verification code. Copy and paste this back into the terminal to proceed.

Remote config
Use auto config?
 * Say Y if not sure
 * Say N if you are working on a remote or headless machine
y) Yes
n) No
y/n> N
If your browser doesn't open automatically go to the following link: <long URL>
Log in and authorize rclone for access
Enter verification code> <your verification code here>
Configure this as a team drive?
y) Yes
n) No
y/n> n
--------------------
[GoogleDrive]
type = drive
scope = drive
token = {"access_token": <lots of token data>}
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d> y
Current remotes:

Name                 Type
====                 ====
GoogleDrive          drive

Your Google Drive connection is now active and you can use rclone to make transfers. For information on rclone commands, see: https://rclone.org/commands/






Github


To get a folder synced to a personal Github repository on HPC, you’ll need to generate a linked SSH key. Fortunately, Github has good documentation to walk you through this. 

Some minor modifications need to be made to the instructions and will be listed:

  1. Generate an SSH Key: https://help.github.com/en/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent

    Modifications: In the line ssh-add -K ~/.ssh/id_rsa, the -K option is not recognized on HPC so this step will not work. Run the following command instead

    $ ssh-add ~/.ssh/id_rsa
  2. Add SSH Key to Github: https://help.github.com/en/github/authenticating-to-github/adding-a-new-ssh-key-to-your-github-account

    Modifications: The command $ pbcopy < ~/.ssh/id_rsa.pub will not work. Use:

    $ cat ~/.ssh/id_rsa.pub


    then copy the output with your cursor and paste it into your Github account as directed.

  3. Clone Your Repository: https://help.github.com/en/github/creating-cloning-and-archiving-repositories/cloning-a-repository

Once you have a repository on your account, you can work with it using the usual git commands.

  • No labels