Transfer Software Summary
Software | CLI Interface? | GUI Interface? | Cloud Services | Notes | |||
---|---|---|---|---|---|---|---|
Google Drive | Amazon Web Services | Box | Dropbox | ||||
Globus | |||||||
SFTP | |||||||
SCP | On Windows, WinSCP is available as a GUI interface. | ||||||
rsync | Grsync is a GUI interface for rsync for multiple platforms. | ||||||
rclone | rclone has recently announced they have an experimental GUI. | ||||||
Cyberduck | |||||||
iRODS |
File transfers and SSH Keys
Several of the file transfer methods listed below use authentication based on the SSH protocol, including scp, sftp and rsync. Therefore, adding your SSH Key to the filexfer.hpc.arizona.edu node can allow one to avoid entering passwords when using those methods. See the documentation for adding SSH Keys.
Transfer Applications and Protocol
GridFTP / Globus
NEW: for comprehensive information on using Globus, see: Globus
Overview
GridFTP is an extension of the standard File transfer Protocol (FTP) for high-speed, reliable, and secure data transfer. Because GridFTP provides a more reliable and high performance file transfer (compared to protocols such as SCP or rsync), it enables the transmission of very large files. GridFTP also addresses the problem of incompatibility between storage and access systems. You can read more about the advantages of GridFTP here.
To use GridFTP, we recommend you use Globus. Globus uses endpoints to make transfers.
Endpoints
Personal Endpoint
If you're trying to use Globus to move files to a local external drive, you may run into a permissions issue. Globus has information on resolving this in their FAQs.
- Go to https://www.globus.org/ and click Log In in the top right corner.
- In the Use your existing organizational login box, type in or find The University of Arizona and hit Continue.
- This will take you to Webauth. Log in as normal.
- You will end up at the Globus File Manager web interface.
- Choose Collections on the left and download Globus Connect Personal for your operating system.
- Type a descriptive name for your local computer under Provide label for future reference and click Allow.
- Under Collection Details, use your UArizona email under Owner Identity (this should be the default) and enter a descriptive Collection Name.
- You should now be able to find your collection on Globus under Endpoints → Administered by You.
HPC Endpoint
The endpoint for HPC can be found by searching UA HPC Filesystems under the Collections tab. If you do not see the UA HPC Filesystems collections, uncheck any checked filter in Quick Filters. Select the result UA HPC Filesystems with the subheading Managed Mapped Collection.
Storage Rental Endpoint
The endpoint for rental storage (found on the filexfer nodes under /rental) can be found by searching UA Rental Storage Filesystem under the Collections tab.
Google Drive Endpoint
- Under the Endpoints tab (see: left-hand side menu) Search for UA Google Drive and select the result UA Google Drive with the subheading Managed Mapped Collection.
- After clicking the result, navigate to the Credentials tab.
- This should bring you to a page to register your credentials. Click Continue.
- Select your university account on the next page to proceed.
- Select Allow to give Globus access to your files.
- Once your credentials are set up, navigate back to the UA Google Drive endpoint, go to the Collections tab and select Add a Guest Collection.
- Under Directory, enter /My Drive to set the correct working directory for your collection. Next, give your collection a descriptive Display Name so you can identify it. When you're ready, click Create Collection.
- Once your collection is ready, you can find it under the Collections tab under Shareable By You. Clicking the name will open it in the File Manager window allowing you to initiate transfers and browse the contents.
AWS S3 Endpoint (UITS Subsidized Tier 2 Storage)
- Under the Collections tab, enter UA AWS S3 in the search bar. In the results, you should see the name UA AWS S3 show up with the description Managed Mapped Collection. Click the endpoint's name to proceed.
- Next, select the Credentials tab. If you are prompted for Authentication/Consent, click Continue
- If requested, authenticate by selecting your Arizona email address, then Allow.
- You will then be returned to the Credentials tab. From there, link to your AWS S3 Bucket by entering your public and private keys in the provided fields
- Once you've added your keys, navigate back to the UA AWS S3 collection, go to the Collections tab, and click Add a Guest Collection on the right
Under Create New Guest Collection, click Browse next to the Directory field to find your group's AWS bucket. You will find it under /ua-rt-t2-faculty_netid/ where faculty_netid is the NetID of the faculty member who requested the bucket. Under Display Name, enter a descriptive name that you can use to identify your bucket. Once you've completed the process, click Create Collection.
If you encounter Authentication/Consent Required after clicking Browse, click Continue, select your university credentials, and click Allow. That should bring you back to the Browse window.
- To find and use your new collection, navigate to the Collections tab, go to Shareable By You, and select the name. That will open your collection in the File Manager window allowing you to view the contents and initiate transfers.
SFTP
The intent is that filexfer.hpc.arizona.edu is to be used for most file transfers. SFTP encrypts data before it is sent across the network. Additional capabilities include resuming interrupted transfers, directory listings, and remote file removal. To transfer files with SFTP, you will need to open an SSH v2 compliant terminal and navigate to a desired working directory on your local machine. To access HPC
$ sftp NetID@filexfer.hpc.arizona.edu
You will then be able to move files between your machine and HPC using get and put commands. For example:
sftp> get /path/to/remote/file /path/to/local/directory ### Retrieves file from HPC. Omitting paths will default to working directories. sftp> put /path/to/local/file /path/to/remote/directory ### Uploads a file from your local computer to HPC. Omitting paths will default to working directories. sftp> help ### prints detailed sftp usage
FTP/LFTP
Due to security risks, it is not possible to FTP to the file transfer node from a remote machine, however, you may FTP from the file transfer node to a remote machine.
HPC uses the FTP client LFTP to transfer files between the file transfer node and remote machines. This can be done using get and put commands. To use lftp, you must first connect to our file transfer node using an SSH v2 compliant terminal:
$ ssh NetID@filexfer.hpc.arizona.edu
Once connected, you may connect to the external host using the command lftp. For example:
$ lftp ftp.hostname.gov
You will then be able to move files between HPC and the remote host using get and put commands. For example:
> get /path/to/remote/file /path/to/local/directory ### retrieves file from remote host > put /path/to/local/file /path/to/remote/directory ### Uploads file from HPC to remote host
For more information on LFTP, see their official documentation.
SCP
SCP uses Secure Shell (SSH) for data transfer and utilizes the same mechanisms for authentication, thereby ensuring the authenticity and confidentiality of the data in transit.
Mac/Linux
You will need to use an SSH v2 compliant terminal to move files to/from HPC. For more information on using SCP, use man scp.
Moving a File or Directory to the HPC:
In your terminal, navigate to the desired working directory on your local machine (laptop or desktop usually). To move a file or directory to a designated subdirectory in your account on HPC:
$ scp -rp filenameordirectory NetId@filexfer.hpc.arizona.edu:subdirectory
Getting a File or Directory From the HPC:
In your terminal, navigate to the desired working directory on your local machine. The copy a remote file from HPC to your current directory:
$ scp -rp NetId@filexfer.hpc.arizona.edu:filenameordirectory .
** the space folllowed by a period at the end means the destination is the current directory**
Wildcards
Wildcards can be used for multiple file transfers (e.g. all files with .dat extension). Note the backslash " \ " preceding *
$ scp NetId@filexfer.hpc.arizona.edu: subdirectory /\*. dat
Windows
Windows users can use software like WinSCP to make SCP transfers. To use WinSCP, first download/install the software from: https://winscp.net/eng/download.php
To connect, enter filexfer.hpc.arizona.edu in the Host Name field, enter your NetID under User name, and enter your password. Accept by clicking Login. You'll be prompted to Duo Authenticate:
rsync
rsync is a fast and extraordinarily versatile file copying tool. It synchronizes files and directories between two different locations (or servers). Rsync copies only the differences of files that have actually changed. An important feature of rsync not found in most similar programs/protocols is that the mirroring takes place with only one transmission in each direction. Rsync can copy or display directory contents and copy files, optionally using compression and recursion. You use rsync in the same way you use scp. You must specify a source and a destination, one of which may be remote.
Example 1:
Recursively transfers all files from the directory src/directory-name on the machine computer-name into the /data/tmp/directory-name directory on the local machine. The files are transferred in archive mode, which ensures that symbolic links, devices, attributes, permissions, ownerships, etc. are preserved in the transfer. Additionally, compression will be used to reduce the size of data portions of the transfer.
$ rsync -avz computer-name:src/directory-name user@remote.host:/data/tmp --log-file=hpc-user-rsync.log
Example 2:
rsync -avz computer-name:src/directory-name/ user@remote.host:/data/tmp --log-file=hpc-user-rsync.log
A trailing slash on the source changes this behavior to avoid creating an additional directory level at the destination. You can think of a trailing / on a source as meaning “copy the contents of this directory” as opposed to “copy the directory by name”, but in both cases the attributes of the containing directory are transferred to the containing directory on the destination.
Additional Options:
Flag | Meaning |
---|---|
-a | Archive mode; will preserve timestamps |
-v | Increase verbosity |
-z | Compress file data during the transfer |
--log-file | Log everything done in specified FILE |
iRODS
The Research Computing test iRODS instance has been dismantled. iRODs servers are available elsewhere (like CyVerse).
There are two ways to iRODS - either by command line or using a GUI like Cyberduck on your workstation.
Command Line
Note that iCommands cannot be used to upload files into Data Store via URL from other sites (ftp, http, etc.). To transfer data from an external site, you first must download the file to a local machine using wget or a similar mechanism, and then use iput to upload it to the Data Store.
iRODS 4 is installed as a standard package to the operating system on every node. This means you will not have to "module load irods". You will still need to "iinit" the first time (see below). iRODS is also available on the filexfer node for use.
Initializing iRODS
If you are looking for information on how to connect to CyVerse's data store, see their iRODS documentation for a guide.
Running iinit for any system using iRODS 4.x, unlike its iRODS3 counterpart, does not help you set up the environment. Instead, you need to run create_irods_env with suitable options for the iRODS host, zone, username,etc manually.
For this key: | Enter this: |
---|---|
-h | <hostname of iRODS server> |
-p | <port number of iRODS server> (1247 is default) |
-z | <Zone name of iRODS zone> |
-u | <user name on the iRODS server> (may not match your netid) |
-a | <authentication method for the iRODS server> (PAM, native,...) |
For example:
$ create_irods_env -a native -h someserver.somewhere.net -z MYZONE
will suffice to create an appropriate ~/.irods/irods_environment.json file to allow you to run iinit; we took the default -p 1247, -u <your NetId> in the above example by omitting -p and -u. You only need to do this step ONE time; subsequent times you will just run iinit and it will asked for your password. Note create_irods_env will NOT overwrite or alter an existing ~/.irods/irods_environment.json file.
Once the ~/.irods/irods_environment.json file is created properly, you should be able to sign in to the iRods server your selected using iinit, viz:
$ iinit Enter your current ... password: # enter your iRODS server password here
At this point you can use other iRods commands such as icp
to move files.
Examples
In the following examples:
- my-files-to-transfer/ is the example name of the directory or folder for bulk transfers.
- my-file-to-transfer.txt is the example name for single file transfers.
- Any filename may be used for the checkpoint-file.
Bulk Files Transfer Example
iput -P -b -r -T --retries 3 -X checkpoint-file my-files-to-transfer/
Single Large File Transfer Example
iput -P -T --retries 3 --lfrestart checkpoint-lf-file my-file-to-transfer.txt
Graphical Interface with Cyberduck
The patch applied in Cyberduck 8.4.4 has caused issues with two-factor authentication which is required to access filexfer.hpc.arizona.edu.
Cyberduck is an open source cross-platform, high-throughput and parallel data transfer transfer program that supports multiple transfer protocols (FTP, SFTP, WebDAV, Cloud files, Amazon S3, etc.). It serves as an alternative to the iDrop Java applet, and has been extensively tested with large data transfers (60-70 GB). This allows users to transfer large files, depending on the user's available bandwidth and network settings.
Cyberduck versions are available for Mac OS (10.6 and higher on Intel 64-bit) and Windows (Windows XP, Windows Vista, Windows 7, or Windows 8). LINUX users should use iDrop Desktop or iCommands. Cyberduck version 4.7.1 (released July 7, 2015) and later supports the iRODS protocol.
To use Cyberduck with iRODS:
- Install or Update Cyberduck. If Cyberduck has already been installed, you may update it under the dropdown Cyberduck → Check for Updates. If you need to install Cyberduck for the first time, go to https://cyberduck.io/, download the installer that is appropriate for your operating system and install. See Cyberduck Preferences for more information on installation.
- Configure Cyberduck for use with iRODS:
- Open Cyberduck and click Open Connection
- In the first drop down field, enter a profile name
- Create the connection entering your iRODS server under Server, 1247 under Port, and your username under Username.
- Transfer Files by opening another connection and dragging/dropping files.
rclone
Rclone is a CLI installed on filexfer.hpc.arizona.edu that can be used to transfer files to Google Drive as well as other Cloud-based storage sites. To use rclone, you will need to start by configuring it. The walkthrough provided by rclone is fairly straightforward.
Configuration
[netid@sdmz-dtn-4 ~]$ rclone config Name Type ==== ==== e) Edit existing remote n) New remote d) Delete remote r) Rename remote c) Copy remote s) Set configuration password q) Quit config e/n/d/r/c/s/q> n name> GoogleDrive Type of storage to configure. Enter a string value. Press Enter for the default (""). Choose a number from below, or type in your own value ... 12 / Google Drive \ "drive" ... Storage> 12 Google Application Client Id client_id> client_secret> scope> 1 root_folder_id> service_account_file> Edit advanced config? (y/n) y) Yes n) No y/n> n
The next prompt will ask if you would like to use auto config, select N here or the configuration will not be successful. You will be given a URL. Copy and paste this into your web browser and follow the prompts to allow rclone to have access to your UArizona Google Drive account. When you are done, it will give you a verification code. Copy and paste this back into the terminal to proceed.
Remote config Use auto config? * Say Y if not sure * Say N if you are working on a remote or headless machine y) Yes n) No y/n> N If your browser doesn't open automatically go to the following link: <long URL> Log in and authorize rclone for access Enter verification code> <your verification code here> Configure this as a team drive? y) Yes n) No y/n> n -------------------- [GoogleDrive] type = drive scope = drive token = {"access_token": <lots of token data>} -------------------- y) Yes this is OK e) Edit this remote d) Delete this remote y/e/d> y Current remotes: Name Type ==== ==== GoogleDrive drive
Your Google Drive connection is now active and you can use rclone to make transfers. For information on rclone commands, see: https://rclone.org/commands/
[netid@sdmz-dtn-4 ~]$ rclone config Current remotes: Name Type ==== ==== e) Edit existing remote n) New remote d) Delete remote r) Rename remote c) Copy remote s) Set configuration password q) Quit config e/n/d/r/c/s/q> n name> AWS Storage> s3 provider> AWS env_auth> false access_key_id> YOUR_KEY_ID_HERE secret_access_key> YOUR_SECRET_ACCESS_KEY_HERE region> us-west-2 endpoint> location_constraint> acl> server_side_encryption> sse_kms_key_id> storage_class> INTELLIGENT_TIERING Edit advanced config? (y/n) y) Yes n) No y/n> n
Github
To get a folder synced to a personal Github repository on HPC, you’ll need to generate a linked SSH key. Fortunately, Github has good documentation to walk you through this. Some minor modifications need to be made to the instructions and will be listed:
Generate an SSH Key:https://help.github.com/en/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent
Modifications: If you encounter instructions telling you to usessh-add -K ~/.ssh/id_rsa
, the-K
option is not recognized on HPC so this step will not work. Run the following command instead$ ssh-add ~/.ssh/id_rsa
Add SSH Key to Github: https://help.github.com/en/github/authenticating-to-github/adding-a-new-ssh-key-to-your-github-account
Modifications: The command $pbcopy < ~/.ssh/id_rsa.pub
will not work. Use:$ cat ~/.ssh/id_rsa.pub
then copy the output with your cursor and paste it into your Github account as directed.
- Clone Your Repository: https://help.github.com/en/github/creating-cloning-and-archiving-repositories/cloning-a-repository
Once you have a repository on your account, you can work with it using the usual git commands.