Getting Started (Command Line)

1. Get DUO

DUO two-factor authentication is required to access the majority of U-M services and all HPC services. If you need to set up DUO, visit the DUO page on the Safe Computing site for instructions to get started.

2. Get an Armis2 user login

You must establish user login on Armis2 by filling out this login request form. If you had a login on Armis, you should have one on Armis2.

3. Get an SSH Client & Connect to Armis2 Login Node

The login node (armis2.arc-ts.umich.edu) is the entry point into the cluster. It is accessible from the Ann Arbor, Dearborn, and Flint campus IP addresses and from the U-M VPN network only and require a valid user account and Duo authentication to log in. They are a shared resource and, as such, it is expected that users do not monopolize them.

If you are trying to log in from off campus, or using an unauthenticated wireless network such as MGuest, you have a couple of options:

See the policies below governing appropriate use of the login nodes.

Mac or Linux:

Open Terminal and type:

ssh uniqname@armis2.arc-ts.umich.edu

You will be required to enter your UMICH (Level-1) password to log in. Please note that as you type your password, nothing you type will appear on the screen; this is completely normal. Press “Enter/Return” key once you are done typing your password.

When you’re connecting for the first time, it’s not uncommon to see a message like this one:

The authenticity of host 'armis2-login1.arc-ts.umich.edu (141.211.19.11)' can't be established.
RSA key fingerprint is 6f:8c:67:df:43:4f:e0:fc:80:5b:49:1a:eb:81:cc:54.
Are you sure you want to continue connecting (yes/no)?

This is normal. By saying “yes” you’re accepting the public SSH key for the system. This key will be stored in a local known_hosts file on your system so you won’t be prompted in the future. The keys from Armis2 will NOT change. So, for example, if you get a new computer and SSH to Armis2, you’ll be prompted to add the key again.

We encourage you to compare the fingerprint you’re presented with, when connecting for the first time, to one of the fingerprints below. The format of the fingerprint you’re presented could be dictated by the SSH client on your machine.

RSA 6f:8c:67:df:43:4f:e0:fc:80:5b:49:1a:eb:81:cc:54
ECDSA Dae1G3gu0mtro2Rm15U6l8aQg4bGFnDQJhmGH3k+fKs
ED25519 9ho43xHw/aVo4q5AalH0XsKlWLKFSGuuw9lt3tCIYEs

In the example message given above, we are presented with the RSA key fingerprint and its MD5 value, which is the same value as in the above table.

If you’re NOT seeing one of these fingerprints, submit a ticket to arcts-support@umich.edu and do NOT connect to the server via SSH until discussing with an ARC staff member to determine if there is a security issue.

To avoid being prompted to accept the key on a new system you may choose to pre-populate your SSH known_hosts file with the pub keys from Armis2. The keys can be found in the FAQ.

Windows (using PuTTY)

Download and install PuTTY.

Launch PuTTY and enter armis2.arc-ts.umich.edu as the host name, then click open.

If you receive a “PuTTY Security Alert” pop-up, this is completely normal, click the “Yes” option. This will tell PuTTY to trust the host the next time you want to connect to it. From there, a terminal window will open; you will be required to enter your UMICH uniqname and then your UMICH (Level-1) password in order to log in. Please note that as you type your password, nothing you type will appear on the screen; this is normal. Press “Enter/Return” key once you are done typing your password.

All Operating Systems

At the “Enter a passcode or select one of the following options:” prompt, type the number of your preferred choice for Duo authentication.

4. Get files

You can use SFTP to transfer data to your /home directory.

SFTP: Mac or Windows using FileZilla

  1. Open FileZilla and click the “Site Manager” button
  2. Create a New Site, which you can name “Armis2” or something similar
  3. Select the “SFTP (SSH File Transfer Protocol)” option
  4. In the Host field, type armis2-xfer.arc-ts.umich.edu
  5. Select “Interactive” for Logon Type
  6. In the User field, type your uniqname
  7. Click “Connect”
  8. Enter your UMICH (Level-1) password
  9. Select your DUO method (1-3) and complete authentication
  10. Drag and drop files between the two systems
  11. Click “Disconnect” when finished

On Windows, you can also use WinSCP with similar settings, available alongside PuTTY.

SFTP: Mac or Linux using Terminal

To copy a single file, type:

scp localfile uniqname@armis2-xfer.arc-ts.umich.edu:./remotefile

To copy an entire directory, type:

scp -r localdir uniqname@armis2-xfer.arc-ts.umich.edu:./remotedir

These commands can also be reversed in order to copy files from Armis2 to your machine:

scp -r uniqname@armis2-xfer.arc-ts.umich.edu:./remotedir localdir

You will need to authenticate via DUO to complete the file transfer.

Globus: Windows, Mac, or Linux

Globus is a reliable high performance parallel file transfer service provided by many HPC sites around the world. It enables easy transfer of files from one system to another, as long as they are Globus endpoints.

  • The Globus endpoint for Great Lakes is “umich#greatlakes”.
How to use Globus

Globus Online is a web front end to the Globus transfer service. Globus Online accounts are free and you can create an account with your University identity.

  • Set up your Globus account and learn how to transfer files using the Globus documentation.  Select “University of Michigan” from the dropdown box to get started.
  • Once you are ready to transfer files, enter “umich#armis2 v2” as one of your endpoints.
Globus Connect Personal

Globus Online also allows for simple installation of a Globus endpoint for Windows, Mac, and Linux desktops and laptops.

  • Follow the Globus instructions to download the Globus Connect Personal installer and set up an endpoint on your desktop or laptop.
Batch File Copies

A non-standard use of Globus Online is that you can use it to copy files from one location to another on the same cluster. To do this use the same endpoint (umich#armis2 as an example) for both the sending and receiving machines. Setup the transfer and Globus will make sure the rest happens. The service will email you when the copy is finished.

Command Line Globus

There are Command line tools for Globus that are intended for advanced users. If you wish to use these, contact HPC support.

5. Submit a job

This is a simple guide to get your jobs up and running. For more advanced Slurm features, see the Slurm User Guide for Armis2. If you are familiar with using the resource manager Torque, you may find the migrating from Torque to Slurm guide useful.

Batch Jobs

Most work will be queued to be run on Armis2 and is described through a batch script. The sbatch command is used to submit a batch script to Slurm. To submit a batch script simply run the following from a shared file system; those include your home directory, /scratch, and any directory under /nfs that you can normally use in a job on Armis. Output will be sent to this working directory (jobName-jobID.log). Do not submit jobs from /tmp or any of its subdirectories.

$ sbatch myJob.sh

The batch job script is composed of three main components:

  • The interpreter used to execute the script
  • #SBATCH directives that convey submission options
  • The application(s) to execute along with its input arguments and options

Example:

#!/bin/bash
# The interpreter used to execute the script

#“#SBATCH” directives that convey submission options:

#SBATCH --job-name=example_job
#SBATCH --mail-type=BEGIN,END
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem-per-cpu=1000m 
#SBATCH --time=10:00
#SBATCH --account=test
#SBATCH --partition=standard

# The application(s) to execute along with its input arguments and options:

/bin/hostname
sleep 60

How many nodes and processors you request will depend on the capability of your software and what it can do. There are four common scenarios.

Example: One Node, One Processor

This is the simplest case and is shown in the example above. The majority of software cannot use more than this. Some examples of software for which this would be the right configuration are SAS, Stata, R, many Python programs, most Perl programs.

#!/bin/bash
#SBATCH --job-name JOBNAME
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1g
#SBATCH --time=00:15:00
#SBATCH --account=test
#SBATCH --partition=standard
#SBATCH --mail-type=NONE

srun --cpu-bind=none hostname -s

Example: One Node, Multiple Processors

This is similar to what a modern desktop or laptop is likely to have. Software that can use more than one processor may be described as multicore, multiprocessor, or mulithreaded. Some examples of software that can benefit from this are MATLAB and Stata/MP. You should read the documentation for your software to see if this is one of its capabilities.

#!/bin/bash
#SBATCH --job-name JOBNAME
#SBATCH --nodes=1
#SBATCH --cpus-per-task=4
#SBATCH --mem-per-cpu=1g
#SBATCH --time=00:15:00
#SBATCH --account=test
#SBATCH --partition=standard
#SBATCH --mail-type=NONE

srun --cpu-bind=none hostname -s

Example: Multiple Nodes, One Process per CPU

This is the classic MPI approach, where multiple machines are requested, one process per processor on each node is started using MPI. This is the way most MPI-enabled software is written to work.

#!/bin/bash
#SBATCH --job-name JOBNAME
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --mem-per-cpu=1g
#SBATCH --time=00:15:00
#SBATCH --account=test
#SBATCH --partition=standard
#SBATCH --mail-type=NONE

srun --cpu-bind=none hostname -s

Example: Multiple Nodes, Multiple CPUs per Process

This is often referred to as the “hybrid mode” MPI approach, where multiple machines are requested and multiple processes are requested. MPI will start a parent process or processes on each node, and those in turn will be able to use more than one processor for threaded calculations.

#!/bin/bash
#SBATCH --job-name JOBNAME
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=4
#SBATCH --mem-per-cpu=1g
#SBATCH --time=00:15:00
#SBATCH --account=test
#SBATCH --partition=standard
#SBATCH --mail-type=NONE

srun --cpu-bind=none hostname -s

Common Job Submission Options

Description Slurm directive (#SBATCH option) Armis2 Usage
Job name --job-name=<name> --job-name=armis2job1
Account --account=<account> --account=test
Queue --partition=<partition_name> --partition=standard

Available partitions: standard, gpu (GPU jobs only), largemem (large memory jobs only)

Wall time limit --time=<hh:mm:ss> --time=02:00:00
Node count --nodes=<count> --nodes=2
Process count per node --ntasks-per-node=<count> --ntasks-per-node=1
Minimum memory per processor --mem-per-cpu=<memory> --mem-per-cpu=1000m
Request software license(s) --licenses=<application>@slurmdb:<N> --licenses=stata@slurmdb:1
requests one license for Stata
Request event notification --mail-type=<events>
Note: multiple mail-type requests may be specified in a comma separated list:
--mail-type=BEGIN,END,NONE,FAIL,REQUEUE
--mail-type=BEGIN,END,FAIL

Please note that if your job is set to utilize more than one node, make sure your code is MPI enabled in order to run across these nodes and you must use srun rather then mpirun or mpiexec. More advanced job submission options can be found in the Slurm User Guide for Armis2.

Interactive Jobs

An interactive job is a job that returns a command line prompt (instead of running a script) when the job runs. Interactive jobs are useful when debugging or interacting with an application. The salloc command is used to submit an interactive job to Slurm. When the job starts, a command line prompt will appear on one of the compute nodes assigned to the job. From here commands can be executed using the resources allocated on the local node.

[user@armis2-login2 ~]$ salloc --account=test 
salloc: Granted job allocation 1001106
salloc: Waiting for resource configuration
salloc: Nodes armis20204 are ready for job
[user@armis20204 ~]$ hostname
armis20204.arc-ts.umich.edu
[user@armis20204 ~]$

Jobs submitted with salloc and no additional specification of resources will be assigned the cluster default values of 1 CPU and 768MB of memory. The account must be specified; the job will not run otherwise. If additional resources are required, they can be requested as options to the salloc command. The following example job would be appropriate for an MPI job where one wants two nodes with four MPI processes using one CPU on each node with one GB of memory for each CPU in each task. MPI programs run from jobs should be started with srun or one of the other commands that will start MPI programs. Note the --cpu-bind=none option, which is recommended unless you know what an efficient processor geometry for your job is.

[user@login ~]$ salloc --nodes=2 --ntasks-per-node=4 --mem-per-cpu=1GB
salloc: Granted job allocation 1001108
salloc: Waiting for resource configuration
salloc: Nodes armis[20303-20304] are ready for job
[user@armis20303 ~]$ srun --cpu-bind=none hostname
armis20303.arc-ts.umich.edu
armis20303.arc-ts.umich.edu
armis20303.arc-ts.umich.edu
armis20303.arc-ts.umich.edu
armis20304.arc-ts.umich.edu
armis20304.arc-ts.umich.edu
armis20304.arc-ts.umich.edu
armis20304.arc-ts.umich.edu
[user@armis20303 ~]$

In the above example srun is used within the job from the first compute node to run a command once for every task in the job on the assigned resources. srun can be used to run on a subset of the resources assigned to the job, though that is fairly uncommon. See the srun man page for more details.

GPU and Large Memory Jobs

Jobs can request GPUs with the job submission options --partition=gpu and a count option from the table below. All counts can be represented by gputype:number or just a number (default type will be used). Available GPU types can be found with the command sinfo -O gres -p <partition>. GPUs can be requested in both Batch and Interactive jobs. Additionally, a user can select the compute mode of GPUs for each job as either exclusive (ARC’s default setting) or shared.  Exclusive mode limits each GPU to run only one process at a time, while shared mode allows multiple processes to run simultaneously on a single GPU.  See the CUDA Programming Guide for more details. Note, you may query the compute mode from any GPU node by entering the command nvidia-smi -q | grep "Compute Mode", where a result of Default refers to the NVIDIA default of shared mode, as opposed to the ARC default selection of exclusive mode.  For example:

$ nvidia-smi -q |grep "Compute Mode"
Compute Mode : Default
 

Description Slurm directive (#SBATCH or srun option) Example
GPUs per node --gpus-per-node=<gputype:number> --gpus-per-node=2 or --gpus-per-node=v100:2
GPUs per job --gpus=<gputype:number> --gpus=2 or --gpus=v100:2
GPUs per socket --gpus-per-socket=<gputype:number> --gpus-per-socket=2 or --gpus-per-socket=v100:2
GPUs per task --gpus-per-task=<gputype:number> --gpus-per-task=2 or --gpus-per-task=v100:2
Compute Mode --gpu_cmode=<shared|exclusive>  --gpu_cmode=shared
CPUs required per GPU --cpus-per-gpu=<number>  --cpus-per-gpu=4
Memory per GPU --mem-per-gpu=<number>  --mem-per-gpu=1000m

Jobs can request nodes with large amounts of RAM with --partition=largemem.

Job Status

Most of a job’s specifications can be seen by invoking scontrol show job <jobID>.  More details about the job can be written to a file by using  scontrol write batch_script <jobID> output.txt. If no output file is specified, the script will be written to slurm<jobID>.sh.

A job’s record remains in Slurm’s memory for 30 minutes after it completes.  scontrol show job will return “Invalid job id specified” for a job that completed more than 30 minutes ago.  At that point, one must invoke the sacct command to retrieve the job’s record from the Slurm database.

To view TRES (Trackable RESource) utilization by user or account, use the following commands (substitute your values for bolded parts):
Shows TRES usage by all users on account during date range:

sreport cluster UserUtilizationByAccount start=mm/dd/yy end=mm/dd/yy account=test --tres type

Shows TRES usage by specified user(s) on account during date range:

sreport cluster UserUtilizationByAccount start=mm/dd/yy end=mm/dd/yy users=un1,un2 account=test --tres type

Lists users alphabetically along with TRES usage and total during date range:

sreport cluster AccountUtilizationByUser start=mm/dd/yy end=mm/dd/yy tree account=test --tres type

Possible TRES types:
cpu
mem
node
gres/gpu

For more reporting options, see the Slurm sreport documentation.