Getting Started (Web-based Open OnDemand)

By |

1. Get Duo

You must use Duo authentication to log on to the Armis2 OnDemand web service.  Get more details on the Safe Computing Two-Factor page and enroll here.

2. Get a Armis2 user login

You must establish a user login on Armis2 by filling out this form.

3. Connect to Armis2 OnDemand

You must be on campus or on the VPN to connect to Armis2 OnDemand.  If you are trying to log in from off campus or using an unauthenticated wireless network such as MGuest, you should install VPN software on your computer.

Once you are on the University network, follow these instructions to connect:

  1. Open your web browser (Firefox, Edge, or Chrome in incognito recommended) and navigate to:
    armis2.arc-ts.umich.edu
  2. Log into cosign using your uniqname and password:
  3. Complete Duo authentication: 
  4. You should now be logged in.

4. Get files

At the top of the page, click “Files” and then “Home Directory”.  A new tab will be created that contains the File Explorer: 

Here you can navigate your home folder.  The buttons do the following:

  • “Go To…”: Navigate to a specified folder
  • “Open in Terminal”: Opens the active folder in a terminal session (new tab)
  • “New File”: Creates a new file in the active folder
  • “New Dir”: Creates a new folder in the active folder
  • “Upload”: Select files from your local machine to upload to the active folder
  • “Show Dotfiles”: Reveals hidden files (usually do not need to be changed)
  • “Show Owner/Mode”: Shows ownership and permission information
  • “View”: Shows file contents inside the current tab
  • “Edit”: Opens a file editor in a new tab
  • “Rename/Move”: Gives a file a new path and/or name
  • “Download”: Downloads the file or folder to your local machine
  • “Copy”: Copies selected files to the clipboard
  • “Paste”: Pastes files from the clipboard
  • “(Un)Select All”: Select or unselect all files/folders
  • “Delete”: Deletes selected files/folders

5. Submit a job

At the top of the home page, click “Jobs” and then “Job Composer”.  A new tab will be created that contains the Job Composer: 

Upon your first visit to this page, you’ll go through a helpful tutorial.  The buttons do the following:

  • “New Job”: Creates a new job…
    • “From Default Template”: Uses system defaults for a bare bones “Hello World” job on the Armis2 cluster.  Please note that you will still need to specify your account.
    • “From Specified Path”: Creates a job from a specified job script.  See the Slurm User Guide for Armis2 for information on writing this script.  Some attributes (name, account) can be set here if not set in the script.
    • “From Selected Job”: Creates a new job that is a copy of the selected job.
  • “Edit Files”: Opens a the project folder in a new File Explorer tab, allowing you to edit the files within (see “Get Files” above for File Explorer instructions).
  • “Job Options”: Allows for editing the Name, Cluster, Job Script, and Account fields.
  • “Open Terminal”: Opens a terminal session in a new tab, starting in the project folder.
  • “Submit”: Submits the selected job to the cluster.
  • “Stop”: Stops the selected job if it has been submitted.
  • “Delete”: Delete the selected job.

To view active job information, click “Jobs” and then “Active Jobs” from the home page.

This is a simple guide to get your jobs up and running. For more advanced Slurm features and job scripting information, see the Slurm User Guide for Armis2. If you are familiar with using the resource manager Torque, you may find the migrating from Torque to Slurm guide useful.

Interactive Apps

At the top of the home page, click “Interactive Apps” and then select your desired application.

 

Armis2 Remote Desktop

Launches an interactive desktop in a new tab. You can select a basic (single node), MPI (multiple node), or advanced (custom number of tasks per node) desktop. Specify your account (usually your PI’s uniqname), hours, memory, cores, nodes (for MPI), partition (standard, gpu, largemem), and software licenses:

 

Upon selecting “Launch”, your job will be queued on one of your nodes and shown on the “My Interactive Sessions” screen. As soon as the job’s status is “Running”, you can change remote desktop settings. For slower internet connections, you can try a higher compression and lower image quality using the sliders.  Conversely, if you have a fast connection you can lower compression and raise image quality.  You can also directly access your node’s terminal by clicking on the hostname (blue button).  Once you’re ready to use the desktop, click on “Launch Basic/MPI/Advanced Desktop”:

A remote desktop session will then be opened in a new tab for the requested amount of time.    If you finish early, return to the “My Interactive Sessions” tab and delete the job.

MATLAB

Launches an interactive desktop with MATLAB configured and running in a new tab.  Specify your desired version, account, hours, partition (standard, gpu, largemem), and memory (4GB minimum):

Upon selecting “Launch”, your job will be queued on one of your nodes and shown on the “My Interactive Sessions” screen. As soon as the job’s status is “Running”, you can change remote desktop settings. For slower internet connections, you can try a higher compression and lower image quality using the sliders.  Conversely, if you have a fast connection you can lower compression and raise image quality.  You can also directly access your node’s terminal by clicking on the hostname (blue button).  Once you’re ready to use the application, click on “Launch MATLAB”:

A remote desktop session running MATLAB will then be opened in a new tab for the requested amount of time. You may also use the terminal and other basic applications. If you finish early, return to the “My Interactive Sessions” tab and delete the job.

RStudio

Launches an interactive desktop with RStudio configured and running in a new tab.  Specify your desired version, account, hours, cores, partition (standard, gpu, largemem), and memory (2GB minimum):

Upon selecting “Launch”, your job will be queued on one of your nodes and shown on the “My Interactive Sessions” screen. As soon as the job’s status is “Running”, you can change remote desktop settings. For slower internet connections, you can try a higher compression and lower image quality using the sliders.  Conversely, if you have a fast connection you can lower compression and raise image quality.  You can also directly access your node’s terminal by clicking on the hostname (blue button).  Once you’re ready to use the application, click on “Launch RStudio”:

A remote desktop session running RStudio will then be opened in a new tab for the requested amount of time. You may also use the terminal and other basic applications. If you finish early, return to the “My Interactive Sessions” tab and delete the job.

Jupyter Notebook/JupyterLab

Jupyter Notebook or JupyterLab

Launches a Jupyter Notebook or JupyterLab in a new tab. Specify your desired Anaconda Python version, account, hours, partition (standard, gpu, largemem), cores, memory, and module commands:

Upon selecting “Launch”, your job will be queued on one of your nodes and shown on the “My Interactive Sessions” screen. As soon as the job’s status is “Running”, you can click on “Connect to Jupyter”:

For instructions on using Jupyter Notebook, see the official documentation.

Cluster Defaults and Partition Limits

By | | No Comments

Armis2 Cluster Defaults

Cluster Defaults Default Value
Default Walltime 60 minutes
Default Memory Per CPU 768 MB
Default Number of CPUs no memory specified: 1 core
Memory specified: memory/768 = # of cores (rounded down)
/scratch file deletion policy 60 days without being accessed (see Scratch Storage Policies below)
/scratch quota per root account 10 TB storage limit (see Scratch Storage Policies below)
/home quota per user 80 GB
Max queued jobs per user per account 5,000 
Shell timeout if idle: 2 hours

Armis2 Partition Limits

Partition Limit standard gpu largemem
Max Walltime 2 weeks
Max running Mem per root account 5160 GB 2210 GB
Max running CPUs per root account 1032 cores 84 cores
Max running GPUs per root account n/a 10 Tesla K40m n/a

Getting Started (Command Line)

By |

1. Get DUO

DUO two-factor authentication is required to access the majority of U-M services and all HPC services. If you need to set up DUO, visit the DUO page on the Safe Computing site for instructions to get started.

2. Get an Armis2 user login

You must establish user login on Armis2 by filling out this login request form. If you had a login on Armis, you should have one on Armis2.

3. Get an SSH Client & Connect to Armis2 Login Node

The login node (armis2.arc-ts.umich.edu) is the entry point into the cluster. It is accessible from the Ann Arbor, Dearborn, and Flint campus IP addresses and from the U-M VPN network only and require a valid user account and Duo authentication to log in. They are a shared resource and, as such, it is expected that users do not monopolize them.

If you are trying to log in from off campus, or using an unauthenticated wireless network such as MGuest, you have a couple of options:

See the policies below governing appropriate use of the login nodes.

Mac or Linux:

Open Terminal and type:

ssh uniqname@armis2.arc-ts.umich.edu

You will be required to enter your UMICH (Level-1) password to log in. Please note that as you type your password, nothing you type will appear on the screen; this is completely normal. Press “Enter/Return” key once you are done typing your password.

When you’re connecting for the first time, it’s not uncommon to see a message like this one:

The authenticity of host 'armis2-login1.arc-ts.umich.edu (141.211.19.11)' can't be established.
RSA key fingerprint is 6f:8c:67:df:43:4f:e0:fc:80:5b:49:1a:eb:81:cc:54.
Are you sure you want to continue connecting (yes/no)?

This is normal. By saying “yes” you’re accepting the public SSH key for the system. This key will be stored in a local known_hosts file on your system so you won’t be prompted in the future. The keys from Armis2 will NOT change. So, for example, if you get a new computer and SSH to Armis2, you’ll be prompted to add the key again.

We encourage you to compare the fingerprint you’re presented with, when connecting for the first time, to one of the fingerprints below. The format of the fingerprint you’re presented could be dictated by the SSH client on your machine.

RSA 6f:8c:67:df:43:4f:e0:fc:80:5b:49:1a:eb:81:cc:54
ECDSA Dae1G3gu0mtro2Rm15U6l8aQg4bGFnDQJhmGH3k+fKs
ED25519 9ho43xHw/aVo4q5AalH0XsKlWLKFSGuuw9lt3tCIYEs

In the example message given above, we are presented with the RSA key fingerprint and its MD5 value, which is the same value as in the above table.

If you’re NOT seeing one of these fingerprints, submit a ticket to arcts-support@umich.edu and do NOT connect to the server via SSH until discussing with an ARC staff member to determine if there is a security issue.

To avoid being prompted to accept the key on a new system you may choose to pre-populate your SSH known_hosts file with the pub keys from Armis2. The keys can be found in the FAQ.

Windows (using PuTTY)

Download and install PuTTY.

Launch PuTTY and enter armis2.arc-ts.umich.edu as the host name, then click open.

If you receive a “PuTTY Security Alert” pop-up, this is completely normal, click the “Yes” option. This will tell PuTTY to trust the host the next time you want to connect to it. From there, a terminal window will open; you will be required to enter your UMICH uniqname and then your UMICH (Level-1) password in order to log in. Please note that as you type your password, nothing you type will appear on the screen; this is normal. Press “Enter/Return” key once you are done typing your password.

All Operating Systems

At the “Enter a passcode or select one of the following options:” prompt, type the number of your preferred choice for Duo authentication.

4. Get files

You can use SFTP to transfer data to your /home directory.

SFTP: Mac or Windows using FileZilla

  1. Open FileZilla and click the “Site Manager” button
  2. Create a New Site, which you can name “Armis2” or something similar
  3. Select the “SFTP (SSH File Transfer Protocol)” option
  4. In the Host field, type armis2-xfer.arc-ts.umich.edu
  5. Select “Interactive” for Logon Type
  6. In the User field, type your uniqname
  7. Click “Connect”
  8. Enter your UMICH (Level-1) password
  9. Select your DUO method (1-3) and complete authentication
  10. Drag and drop files between the two systems
  11. Click “Disconnect” when finished

On Windows, you can also use WinSCP with similar settings, available alongside PuTTY.

SFTP: Mac or Linux using Terminal

To copy a single file, type:

scp localfile uniqname@armis2-xfer.arc-ts.umich.edu:./remotefile

To copy an entire directory, type:

scp -r localdir uniqname@armis2-xfer.arc-ts.umich.edu:./remotedir

These commands can also be reversed in order to copy files from Armis2 to your machine:

scp -r uniqname@armis2-xfer.arc-ts.umich.edu:./remotedir localdir

You will need to authenticate via DUO to complete the file transfer.

5. Submit a job

This is a simple guide to get your jobs up and running. For more advanced Slurm features, see the Slurm User Guide for Armis2. If you are familiar with using the resource manager Torque, you may find the migrating from Torque to Slurm guide useful.

Batch Jobs

Most work will be queued to be run on Armis2 and is described through a batch script. The sbatch command is used to submit a batch script to Slurm. To submit a batch script simply run the following from a shared file system; those include your home directory, /scratch, and any directory under /nfs that you can normally use in a job on Armis. Output will be sent to this working directory (jobName-jobID.log). Do not submit jobs from /tmp or any of its subdirectories.

$ sbatch myJob.sh

The batch job script is composed of three main components:

  • The interpreter used to execute the script
  • #SBATCH directives that convey submission options
  • The application(s) to execute along with its input arguments and options

Example:

#!/bin/bash
# The interpreter used to execute the script

#“#SBATCH” directives that convey submission options:

#SBATCH --job-name=example_job
#SBATCH --mail-type=BEGIN,END
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem-per-cpu=1000m 
#SBATCH --time=10:00
#SBATCH --account=test
#SBATCH --partition=standard

# The application(s) to execute along with its input arguments and options:

/bin/hostname
sleep 60

How many nodes and processors you request will depend on the capability of your software and what it can do. There are four common scenarios.

Example: One Node, One Processor

This is the simplest case and is shown in the example above. The majority of software cannot use more than this. Some examples of software for which this would be the right configuration are SAS, Stata, R, many Python programs, most Perl programs.

#!/bin/bash
#SBATCH --job-name JOBNAME
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1g
#SBATCH --time=00:15:00
#SBATCH --account=test
#SBATCH --partition=standard
#SBATCH --mail-type=NONE

srun --cpu-bind=none hostname -s

Example: One Node, Multiple Processors

This is similar to what a modern desktop or laptop is likely to have. Software that can use more than one processor may be described as multicore, multiprocessor, or mulithreaded. Some examples of software that can benefit from this are MATLAB and Stata/MP. You should read the documentation for your software to see if this is one of its capabilities.

#!/bin/bash
#SBATCH --job-name JOBNAME
#SBATCH --nodes=1
#SBATCH --cpus-per-task=4
#SBATCH --mem-per-cpu=1g
#SBATCH --time=00:15:00
#SBATCH --account=test
#SBATCH --partition=standard
#SBATCH --mail-type=NONE

srun --cpu-bind=none hostname -s

Example: Multiple Nodes, One Process per CPU

This is the classic MPI approach, where multiple machines are requested, one process per processor on each node is started using MPI. This is the way most MPI-enabled software is written to work.

#!/bin/bash
#SBATCH --job-name JOBNAME
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --mem-per-cpu=1g
#SBATCH --time=00:15:00
#SBATCH --account=test
#SBATCH --partition=standard
#SBATCH --mail-type=NONE

srun --cpu-bind=none hostname -s

Example: Multiple Nodes, Multiple CPUs per Process

This is often referred to as the “hybrid mode” MPI approach, where multiple machines are requested and multiple processes are requested. MPI will start a parent process or processes on each node, and those in turn will be able to use more than one processor for threaded calculations.

#!/bin/bash
#SBATCH --job-name JOBNAME
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=4
#SBATCH --mem-per-cpu=1g
#SBATCH --time=00:15:00
#SBATCH --account=test
#SBATCH --partition=standard
#SBATCH --mail-type=NONE

srun --cpu-bind=none hostname -s

Common Job Submission Options

Description Slurm directive (#SBATCH option) Armis2 Usage
Job name --job-name=<name> --job-name=armis2job1
Account --account=<account> --account=test
Queue --partition=<partition_name> --partition=standard

Available partitions: standard, gpu (GPU jobs only), largemem (large memory jobs only)

Wall time limit --time=<hh:mm:ss> --time=02:00:00
Node count --nodes=<count> --nodes=2
Process count per node --ntasks-per-node=<count> --ntasks-per-node=1
Minimum memory per processor --mem-per-cpu=<memory> --mem-per-cpu=1000m
Request software license(s) --licenses=<application>@slurmdb:<N> --licenses=stata@slurmdb:1
requests one license for Stata
Request event notification --mail-type=<events>
Note: multiple mail-type requests may be specified in a comma separated list:
--mail-type=BEGIN,END,NONE,FAIL,REQUEUE
--mail-type=BEGIN,END,FAIL

Please note that if your job is set to utilize more than one node, make sure your code is MPI enabled in order to run across these nodes and you must use srun rather then mpirun or mpiexec. More advanced job submission options can be found in the Slurm User Guide for Armis2.

Interactive Jobs

An interactive job is a job that returns a command line prompt (instead of running a script) when the job runs. Interactive jobs are useful when debugging or interacting with an application. The srun command is used to submit an interactive job to Slurm. When the job starts, a command line prompt will appear on one of the compute nodes assigned to the job. From here commands can be executed using the resources allocated on the local node.

[user@login ~]$ srun --cpu-bind=none --pty /bin/bash 
srun: job 309 queued and waiting for resources 
srun: job 309 has been allocated resources 
[user@node0001 ~]$ hostname 
bn01.stage.arc-ts.umich.edu 
[user@node0001 ~]$

Jobs submitted with srun --cpu-bind=none -–pty /bin/bash will be assigned the cluster default values of 1 CPU and 1024MB of memory. If additional resources are required, they can be requested as options to the srun command. The following example job is assigned 2 nodes with 4 CPUS and 4GB of memory each:

[user@login ~]$ srun --cpu-bind=none --nodes=2 --ntasks-per-node=4 --mem-per-cpu=1GB --pty /bin/bash
srun: job 894 queued and waiting for resources
srun: job 894 has been allocated resources
[user@node0001 ~]$ srun --cpu-bind=none hostname
node0001.armis2.arc-ts.umich.edu
node0001.armis2.arc-ts.umich.edu
node0002.armis2.arc-ts.umich.edu
node0001.armis2.arc-ts.umich.edu
node0001.armis2.arc-ts.umich.edu
node0002.armis2.arc-ts.umich.edu
node0002.armis2.arc-ts.umich.edu
node0002.armis2.arc-ts.umich.edu

In the above example srun is used within the job from the first compute node to run a command once for every task in the job on the assigned resources. srun can be used to run on a subset of the resources assigned to the job. See the srun man page for more details.

GPU and Large Memory Jobs

Jobs can request GPUs with the job submission options --partition=gpu and a count option from the table below. All counts can be represented by gputype:number or just a number (default type will be used). Available GPU types can be found with the command sinfo -O gres -p <partition>. GPUs can be requested in both Batch and Interactive jobs.

Description Slurm directive (#SBATCH or srun option) Example
GPUs per node --gpus-per-node=<gputype:number> --gpus-per-node=2 or --gpus-per-node=v100:2
GPUs per job --gpus=<gputype:number> --gpus=2 or --gpus=v100:2
GPUs per socket --gpus-per-socket=<gputype:number> --gpus-per-socket=2 or --gpus-per-socket=v100:2
GPUs per task --gpus-per-task=<gputype:number> --gpus-per-task=2 or --gpus-per-task=v100:2
CPUs required per GPU --cpus-per-gpu=<number>  --cpus-per-gpu=4
Memory per GPU --mem-per-gpu=<number>  --mem-per-gpu=1000m

Jobs can request nodes with large amounts of RAM with --partition=largemem.

Job Status

Most of a job’s specifications can be seen by invoking scontrol show job <jobID>.  More details about the job can be written to a file by using  scontrol write batch_script <jobID> output.txt. If no output file is specified, the script will be written to slurm<jobID>.sh.

A job’s record remains in Slurm’s memory for 30 minutes after it completes.  scontrol show job will return “Invalid job id specified” for a job that completed more than 30 minutes ago.  At that point, one must invoke the sacct command to retrieve the job’s record from the Slurm database.

To view TRES (Trackable RESource) utilization by user or account, use the following commands (substitute your values for bolded parts):
Shows TRES usage by all users on account during date range:

sreport cluster UserUtilizationByAccount start=mm/dd/yy end=mm/dd/yy account=test --tres type

Shows TRES usage by specified user(s) on account during date range:

sreport cluster UserUtilizationByAccount start=mm/dd/yy end=mm/dd/yy users=un1,un2 account=test --tres type

Lists users alphabetically along with TRES usage and total during date range:

sreport cluster AccountUtilizationByUser start=mm/dd/yy end=mm/dd/yy tree account=test --tres type

Possible TRES types:
cpu
mem
node
gres/gpu

For more reporting options, see the Slurm sreport documentation.

Software

By |

The Armis2 cluster uses the Lmod modules system to provide access to centrally installed software. If you used a cluster at U-M previously, then you should review the documentation for the module system as we have changed the configuration to match that used at most national clusters and most other university clusters.

In particular, you should use the command module keyword to look for a module and do not use module available to search for software, as module available will only show software for which all the dependencies (or prerequisites) are already loaded.

So, to search for the software package FFTW, use

$ module keyword fftw

That will show which versions are installed and provide a command to determine what is needed to load it.

Please see our page on using the Lmod modules system for more details and examples.

There are two main categories of software available on the system: software that is installed as part of the installation of the operating system and software that is installed separately. No special action is needed to use the software installed with the operating system. The separately installed software is set up so that you will use a module to use it. The module will set up the environment and make the software available. We do it this way to enable having multiple versions of the same package and to avoid having conflicts between software packages that have mutually exclusive system requirements.

Requesting software licenses

Many of the software packages that are licensed for use on ARC clusters are licensed for a limited number of concurrent uses. If you will use one of those packages, then you must request a license or licenses in your submission script. As an example, to request one Stata license, you would use

#SBATCH --licenses=stata@slurmdb:1

The list of software can be found from Armis2 by using the command

$ scontrol show licenses

Policies

By |

Partition Policies

Slurm partitions represent collections of nodes for a computational purpose, and are equivalent to Torque queues. For more Armis2 hardware specifications, see the Configuration page.

Partitions:

  • debug: The goal of debug is to allow users to run jobs quickly for debugging purposes.
    • Max walltime: 4 hours
    • Max jobs per user: 1
    • Higher scheduling priority
  • standard: Standard compute nodes used for most work.
    • Max walltime: 14 days
    • Default partition if none specified
  • gpu: Allows use of NVIDIA Tesla V100 GPUs.
    • Max walltime: 14 days
  • largemem: Allows use of a compute node with 1.5 TB of RAM.
    • Max walltime: 14 days

Account/Association Limits

In order to facilitate fairness between accounts, we have set resource limits on each Armis2 root account which are described here.

Limits can be set on a Slurm association or on an Slurm account. This allows a PI to limit individual users or the collective set of users in an account as the PI sees fit. The following values can be used to limit either an account or user association, unless noted otherwise below:

Current Armis2 partition limits:

  • MaxJobs
    • Maximum number of jobs allowed to run at one time
    • Account example: testaccount can have 10 simultaneously running jobs (testuser1 has 8 running jobs and testuser2 has 2 running jobs for a total of 10 running jobs)
    • Association example: testuser can have 2 simultaneously running jobs
  • MaxWall
    • Maximum duration of a job
    • Account example: all users on testaccount can run jobs for up to 3 days
    • Association example: testuser’s jobs can run up to 3 days
  • MaxTRES (CPU, Memory, GPU or billing units)
    • Maximum number of TRES the running jobs can simultaneously use
    • NOTE: CPU, Memory, and GPU can also be limited on a user’s individual job
    • Account example: testaccount’s running jobs can collectively use up to 5 GPUs (testuser1’s jobs are using 3 GPUs and testuser2’s jobs are using 2 GPUs for a total of 5 GPUs)
    • Association example: testuser’s running jobs can collectively use up to 10 cores
    • Job example: testuser can run a single job using up to 10 cores
  • GrpTRESMins (billing units)
    • The total number of TRES minutes that can possibly be used by past, present and future jobs. This is primarily used for setting spending limits
    • Account example: all users on testaccount share a spending limit of $1000
    • Association example: testuser has a spending limit of $1000
  • GrpTRESRunMins
    • The total number of TRES minutes used by all running jobs. This takes into consideration the time limit of running jobs. If the limit is reached no new jobs are started until other jobs finish.
    • Account example: all users on testaccount share a pool of 1000 CPU minutes for running jobs (users have 10 serial jobs each with 100 minutes remaining to completion)
    • Association example: testuser can have up to 100 CPU minutes of running jobs (1 job with 100 CPU minutes remaining, 2 with 50 minutes remaining, etc.)
Periodic Spending Limits

The PI has the ability to set a monthly or yearly (fiscal year) spending limit on a Slurm account. Spending limits will be updated at the beginning of each month. As an example, if the testaccount account has a monthly spending limit of $1000 and this is used up on January 22nd, jobs will be unable to run until February 1st when the limit will reset with another $1000 to spend.

Please contact ARC if you would like to implement any of these limits.

Refund Policy

ARC operates our HPC clusters to the best of our abilities, but there can be events, both within and outside of our control, which may cause interruptions to your jobs. You are responsible for due diligence around your use of the ARC HPC resources and taking measures to maximize your research.  These actions may include:

  • Backing up data to permanent storage locations
  • Checkpointing your code to minimize impacts from job interruptions
  • Error checking in your scripts
  • Understanding the operation of the system and the user guide for the HPC cluster, including per job charges which may be greater than expected

Any refunds (if any) are at the discretion of ARC and will only only be enacted during system-wide preventable issues.  This does not include hardware failure, power failures, job failures, or similar issues.

ARMIS2 TERMS OF USAGE

  1. This service is for sensitive data only. Be advised that you should not move sensitive data off of this system, unless it is to another service or machine that has been approved for hosting the same types of sensitive data.
  2. Limited data restoration. The data in your home directory can be restored from snapshots going back 3 days.  Anything beyond 3 days can not be retrieved.  Data stored on outside your home directory such as a group share will be subject to other data-lifetime policies that is setup at the time of purchasing the respective Turbo NFS volume. You are responsible for mitigating your own risk. We suggest you store copies of hard-to-reproduce data in your home directory or on HIPAA-aligned storage you own or purchased from Turbo.
  3. System usage is tracked and is used for billing reports and capacity planning. Job metadata (example: walltime, resource utilization, software accessed) is stored and used to generate usage reports and to analyze patterns and trends. ARC may report this metadata, including your individual metadata data, to your adviser, department head, dean, or other administrator or supervisor for billing or capacity planning purposes.
  4. Maintaining the overall stability of the system is paramount to us. While we make every effort to ensure that every job completes with the most efficient and accurate way possible, the good of the whole is more important to us than the good of an individual. This may affect you, but mostly we hope it benefits you. System availability is based on our best efforts. We are staffed to provide support during normal business hours. We try very hard to provide support as broadly as possible, but cannot guarantee support on a 24 hour per day basis. Additionally, we perform system maintenance on a periodic basis, driven by the availability of software updates, staffing availability, and input from the user community. We do our best to schedule around your needs, but there will be times when the system is unavailable. For scheduled outages, we will announce them at least one month in advance on the ARC home page; for unscheduled outages we will announce them as quickly as we can with as much detail as we have on that same page. You can also track ARC at Twitter name @umichARC.
  5. Armis2 is intended only for non-commercial, academic research and instruction. Commercial use of some of the software on Armis2 is prohibited by software licensing terms. Prohibited uses include product development or validation, software use supporting any service for which a fee is charged, and, in some cases, research involving proprietary data that will not be made available publicly regardless whether the research is published . Please contact arcts-support@umich.edu if you have any questions about this policy, or about whether your work may violate these terms.
  6. Data subject to export control and HIPAA regulations may be stored or processed on the cluster. The appropriate storage solution for storing export controlled information or PHI that can be accessed on the Armis2 cluster is the  Turbo-NFSv4 with Kerberos offering(See the Sensitive Data Restrictions for Turbo-NFSv4 with Kerberos for further details). It is your responsibility, not ARC’s, to be aware of and comply with all applicable laws, regulations, and universities policies (e.g., ITAR, EAR, HIPAA) as part of any research activity that may raise compliance issues under those laws. For assistance with export controlled research, contact the U-M Export Control Officer at exportcontrols@umich.edu. For assistance with HIPAA-related computational research, contact the ARC liaison to the Medical School at msis.help@umich.edu.

USER RESPONSIBILITIES

Users should make requests by email to arcts-support@umich.edu:

  • One day in advance, request users be added to Armis2 accounts you may administer. All users need approval to be added to an account on Armis2 before they can have a user login created on the cluster.

Users are responsible for security and compliance related to sensitive code and/or data. Security and compliance are shared responsibilities. If you process or store sensitive university data, software, or libraries on the cluster, you are responsible for understanding and adhering to any relevant legal, regulatory or contractual requirements.

Users are responsible for maintaining MCommunity groups used for MReport authorizations.

Users must manage PHI (protected health information) appropriately and can use the following locations:

  • /home (80 GB quota)
  • /scratch (more information below)
  • /tmp
  • Any appropriate PHI-compliant NFS volume mounted on Armis2
SCRATCH STORAGE POLICIES

Every user has a /scratch directory for every Slurm account they are a member of.  Additionally for that account, there is a shared data directory for collaboration with other members of that account.  The account directory group ownership is set using the Slurm account-based UNIX groups, so all files created in the /scratch directory are accessible by any group member, to facilitate collaboration.

Example:
/scratch/msbritt_root
/scratch/msbritt_root/msbritt
/scratch/msbritt_root/shared_data

There is a 10 TB quota on /scratch per root account (a PI or project account), which is shared between child accounts (individual users).

 If you are in need of more scratch space for your account please email us at arcts-support@umich.edu. Please note that these requests need to come from an administrator on the account and should include an explanation of why the increase is required. 

 

Users should keep in mind that scratch has an auto-purge policy on unaccessed files, which means that any unaccessed data will be automatically deleted by the system after 60 days. Scratch file systems are not backed up. Critical files should be backed up to another location.

LOGIN NODE POLICIES

Appropriate uses for the login nodes:

  • Transferring small files to and from the cluster
  • Creating, modifying, and compiling code and submission scripts
  • Submitting and monitoring the status of jobs
  • Testing executables to ensure they will run on the cluster and its infrastructure. Processes are limited to a maximum of 15 minutes of CPU time to prevent runaway processes and overuse.

Any other uses of the login node may result in the termination of the process in violation. Any production processes (including post processing) should be submitted through the batch system to the cluster. If interactive use is required then you should submit an interactive job to the cluster.