What is Great Lakes?

Great Lakes is an ARC-TS managed HPC cluster available to faculty (PIs) and their students/researchers. All computational work is scheduled via the Slurm resource manager and task scheduler. For detailed hardware information, see the configuration page. Great Lakes is not suitable for HIPAA or other sensitive data.

What forms do I need to fill out?

  1. The Principal Investigator (PI) needs to request a Slurm account, specifying users that can access the account, the people which can administer that account, and payment details.
  2. Each user given access to the account must request a user login. Please refer to the Great Lakes User Guide for additional steps and usage information.

What is a UMRCP account and how can I get one on Great Lakes?

The University of Michigan Research Computing Package (UMRCP), provided by ITS, is an investment into the U-M research community via simple, dependable access to several ITS-provided high-performance computing clusters and data storage resources. For more information, please visit our UMRCP page.

Will my Turbo storage be available on Great Lakes?

Since Turbo is a storage service independent of Great Lakes, users that utilized Turbo on Flux will still be able to access their data on Great Lakes.  The cost of Turbo will not change and no data needs to be transferred.  If you have trouble accessing Turbo, please contact arc-support@umich.edu.

How do I submit jobs using a web interface?

Great Lakes utilizes Open OnDemand to enable web-based job submission, manage the files in their home directory, view/delete active jobs, and open a web terminal session. Users can also use Matlab, Jupyter Notebooks, RStudio, and get a remote desktop.

You must be on campus or on the VPN to connect to Great Lakes OnDemand.  For more information, see the OnDemand section of the Great Lakes User Guide.

What should I do if I receive a Bad Request error when accessing Open OnDemand?

If you receive a “Bad Request: Your browser sent a request that this server could not understand. Size of request header field exceeds server limit.” error, please clear your browser’s cookies and try to access Great Lakes OnDemand again.

This error can occur when the request header sent by the browser becomes too large. This is typically caused by accumulated cookies and cached data for a specific site; in some cases, corrupted cookies can also contribute to the problem.

The vendor of Open OnDemand stated that, currently, it does not do cookie/cache clean-up. We recommend using your browser in incognito mode to avoid this potential issue.

Can I use SSH key-based authentication to access Great Lakes?

Unfortunately, this is not currently possible. We use Kerberos, a network authentication protocol which utilizes a different system based on tickets and symmetric-key cryptography. This system isn’t compatible with the public-private key pair used in SSH key-based authentication.

While we understand the convenience of SSH keys, it’s crucial that we prioritize our system’s security, which Kerberos affords us through centralized and time-sensitive ticketing control.

How do I view the resource usage on my account?

To view TRES (Trackable RESource) utilization by user or account, use the following commands (substitute bold variables):
Shows TRES usage by all users on account during date range:

sreport cluster UserUtilizationByAccount start=mm/dd/yy end=mm/dd/yy account=test --tres type

Shows TRES usage by specified user(s) on account during date range:

sreport cluster UserUtilizationByAccount start=mm/dd/yy end=mm/dd/yy users=un1,un2 account=test --tres type

Lists users alphabetically along with TRES usage and total during date range:

sreport cluster AccountUtilizationByUser start=mm/dd/yy end=mm/dd/yy tree account=test --tres type

Possible TRES types:
cpu
mem
node
gres/gpu

To view disk usage and availability by user, type:

home-quota -u uniqname

For more reporting options, see the Slurm sreport documentation.

What is the difference between GiB vs GB, MiB vs MB?

A gibibyte (GiB) and a gigabyte (GB) are sometimes used as synonyms, though technically they do not describe the same amount of data capacity. What’s the difference? A gibibyte is a unit of measurement for data capacity in computing, and is based on powers of two. A gigabyte is also a unit of measurement for data, but is based in powers of 10. For example, 1 GiB equals 1,024 MiBs and 1 GB equals 1000 MiBs. This concept can be applied to the other prefixes. With the expansion of data capacity, gigabytes have become a more outdated unit of measure when compared to gibibytes. To be as accurate and consistent as possible, Slurm, slurm mail, and our additional tooling uses binary byte notation (GiB, MiB, etc) to measure data capacity.

What is a “root (_root) account”?

Each PI or project has a collection of Slurm accounts which could be used for different purposes (e.g. different grants or focuses of research) with different users.  These Slurm accounts are contained within the PI/project’s root account (e.g. research_root).  For example:

researcher_root
    researcher
        user1
        user2
    researcher1
        user2
        user3

These accounts can have different limits on them, and are also collectively limited for /scratch usage and overall cluster usage.

As a PI, how can I limit usage on my account?

Principal Investigators can request that CPU, GPU, memory, billing units, and walltime be limited per user or group of users on their account.  For more information, see the Great Lakes policy documentation.

Limits must be requested by emailing arcts-support@umich.edu.

As a PI, can I purchase my own nodes for Great Lakes?

PIs may purchase hardware for use on the Lighthouse cluster by emailing arcts-support@umich.edu to develop a hardware plan. Lighthouse utilizes the same Slurm job scheduler and infrastructure as Great Lakes, but purchased nodes can be used exclusively by the PI’s group.

What does my job status mean?

When listing your submitted jobs with squeue -u uniqname, the final column titled “NODELIST(REASON)” will give you the reason that the job is not running yet. The possible statuses are:

Resources

This job is waiting for the resources (CPUs, Memory, GPUs) it requested to become available. Resources become available when currently running jobs complete. The job with Resources in the NODELIST(REASON) column is the top priority job and should be started next.

Priority

This job is not the top priority, so it must wait in the queue until it becomes the top priority job. Once it becomes the top priority job, the NODELIST(REASON) column will change to “Resources”. The priority of all pending jobs can be shown with the sprio command. A job’s priority is determined by two factors: fairshare and age. The fairshare factor in a job’s priority is influenced by the amount of resources that have been consumed by members of your Slurm account. More recent usage means a lower fairshare priority. The age factor is determined by the job’s queued time. The longer the job has been waiting in the queue, the higher the age priority.

AssocGrpCpuLimit

This job was submitted with a Slurm account that has a limit set on the number of CPUs that may be used at one time. This limit is set for all jobs by all users of the same Slurm account. Once some of the jobs running under this Slurm account complete, this reason will change to Priority or Resources unless there is some other limit or dependency. All jobs running under a given Slurm account can be viewed by running squeue --account=account_name

AssocGrpGRES

This job was submitted with a Slurm account that has a limit set on the number of GPUs that may be used at one time. This limit is set for all jobs by all users of the same Slurm account. Once some of the jobs running under this Slurm account complete, this reason will change to Priority or Resources unless there is some other limit or dependency. All jobs running under a given Slurm account can be viewed by running squeue --account=account_name

AssocGrpMem

This job was submitted with a Slurm account that has a limit set on the amount of memory that may be used at one time. This limit is set for all jobs by all users of the same Slurm account. Once some of the jobs running under this Slurm account complete, this reason will change to Priority or Resources unless there is some other limit or dependency. All jobs running under a given Slurm account can be viewed by running squeue --account=account_name

AssocGrpBillingMinutes

This job was submitted with a Slurm account that has a limit set on the amount of monetary charges that may be accrued. Jobs that are pending with this reason will not start until the limit has been raised or the monthly bill has been processed.

Dependency

This job has a dependency on another job. It will not start until that dependency is met. The most common dependency is waiting for another job to complete.

QOSMinGRES

This job was submitted to the GPU partition, but did not request a GPU. This job will never start. This job should be deleted and resubmitted to a different partition or if a GPU is needed, resubmitted to the GPU partition with a GPU request. A GPU can be requested by adding the following line to a batch script: #SBATCH --gres=gpu:1

How Can I Access On-Campus Restricted Software?

From the Command Line

Log into an on-campus login node via ssh client to gl-campus-login.arc-ts.umich.edu

From Open On-Demand

Open your browser (Firefox, Edge, or Chrome in an incognito tab – recommended) and navigate to greatlakes-oncampus.arc-ts.umich.edu.

What are the SSH pub keys for Great Lakes?

If you wish to pre-populate your SSH client configuration with the publicly available keys for Great Lakes, they are as follows:

ECDSA:

greatlakes.arc-ts.umich.edu,greatlakes-oncampus.arc-ts.umich.edu,gl-login?.arc-ts.umich.edu,141.211.192.38,141.211.192.39,141.211.192.40,141.211.192.41 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBHWel/rAXqIJYxexVzMSlgy/fICWukn8DaOGMPpAomH1E5AhCjrH2zMMTJHtXYsRA+brm/sTbn21Zw+pgpgJSYA=

ED25519:

greatlakes.arc-ts.umich.edu,greatlakes-oncampus.arc-ts.umich.edu,gl-login?.arc-ts.umich.edu,141.211.192.38,141.211.192.39,141.211.192.40,141.211.192.41 ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICwaAq9LI48vVO4qbt35Xfz1pi+RE1Krq1iIeJQqoFEw

RSA:

greatlakes.arc-ts.umich.edu,greatlakes-oncampus.arc-ts.umich.edu,gl-login?.arc-ts.umich.edu,141.211.192.38,141.211.192.39,141.211.192.40,141.211.192.41 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEA16eDiBWF3SgPQXEeJsH8dsxO8x3o5KkdqWMg/lK57Kpwf4QGXJNvYy0jxSAuKTRim/ob6+nDRH8zIOwnl9tlyEw+8VN3WR8nqBqxX6Km2yzTOMO8Lh7fLuMTZHOdEz0uOn6tBP8LTMtHN9h/fANjKFVl8N+jsejMXrPf0w7jGjc=

 

On a Mac or Linux machine you’ll add the keys to your known_hosts file.

On a Mac this file is: /Users/<username>/.ssh/known_hosts.
On Linux: /home/<username>/.ssh/known_hosts.

The known_hosts file should have 644 (i.e. -rw-r--r--) permissions.

If you are using an SSH client that is not part of your operating system (e.g. Windows using PuTTY), please see the client documentation referring to host key verification.

A good start for PuTTY users can be found here (section A.2.9 “Is there an option to turn off the annoying host key prompts?)”

Can I get a refund on a failed job?

Any refunds (if any) are at the discretion of ARC and will only only be enacted during system-wide preventable issues. This does not include hardware failure, power failures, job failures, or similar issues.  For more information, see the Great Lakes policies.

If you have a problem not listed here, please send an email to arcts-support@umich.edu.

Order Service

Billing for the Great Lakes service began on January 6, 2020. Existing, active Flux accounts and logins have been added to the Great Lakes Cluster. Complete this form to get a new Great Lakes cluster login.

If you would like to create a Great Lakes Cluster account or have any questions, contact arcts-support@umich.edu with lists of users, admins, and a shortcode. UMRCP accounts are also available to eligible researchers. For more information, please visit our UMRCP page.