Tag

Slurm

Research Computing on the Great Lakes cluster

By |

This workshop will provide a brief overview of the the new HPC environment and is intended for current Flux and Armis users.  We will use the temporary Beta HPC cluster to demonstrate how jobs will be submitted and managed under the new Great Lakes, Armis2, and Lighthouse clusters available later this year.

There are many differences between the familiar Flux environment and that of the new HPC clusters, including a new batch scheduling system, a new interactive batch job environment, a new HPC web portal, a new module environment, and a new on-demand-only job accounting system.

We will cover these differences in the workshop, and provide hands-on training in creating and running job submission scripts in the new HPC environment.  Students are expected to be conversant with the Linux command line and have experience in creating, submitting, and troubleshooting PBS batch scripts.

Research Computing on the Great Lakes cluster

By |

This workshop will provide a brief overview of the the new HPC environment and is intended for current Flux and Armis users.  We will use the temporary Beta HPC cluster to demonstrate how jobs will be submitted and managed under the new Great Lakes, Armis2, and Lighthouse clusters available later this year.

There are many differences between the familiar Flux environment and that of the new HPC clusters, including a new batch scheduling system, a new interactive batch job environment, a new HPC web portal, a new module environment, and a new on-demand-only job accounting system.

We will cover these differences in the workshop, and provide hands-on training in creating and running job submission scripts in the new HPC environment.  Students are expected to be conversant with the Linux command line and have experience in creating, submitting, and troubleshooting PBS batch scripts.

Research Computing on the Great Lakes cluster

By |

This workshop will provide a brief overview of the the new HPC environment and is intended for current Flux and Armis users.  We will use the temporary Beta HPC cluster to demonstrate how jobs will be submitted and managed under the new Great Lakes, Armis2, and Lighthouse clusters available later this year.

There are many differences between the familiar Flux environment and that of the new HPC clusters, including a new batch scheduling system, a new interactive batch job environment, a new HPC web portal, a new module environment, and a new on-demand-only job accounting system.

We will cover these differences in the workshop, and provide hands-on training in creating and running job submission scripts in the new HPC environment.  Students are expected to be conversant with the Linux command line and have experience in creating, submitting, and troubleshooting PBS batch scripts.

Research Computing on the Great Lakes cluster

By |

This workshop will provide a brief overview of the the new HPC environment and is intended for current Flux and Armis users.  We will use the temporary Beta HPC cluster to demonstrate how jobs will be submitted and managed under the new Great Lakes, Armis2, and Lighthouse clusters available later this year.

There are many differences between the familiar Flux environment and that of the new HPC clusters, including a new batch scheduling system, a new interactive batch job environment, a new HPC web portal, a new module environment, and a new on-demand-only job accounting system.

We will cover these differences in the workshop, and provide hands-on training in creating and running job submission scripts in the new HPC environment.  Students are expected to be conversant with the Linux command line and have experience in creating, submitting, and troubleshooting PBS batch scripts.

Research Computing on the Great Lakes cluster

By |

This workshop will provide a brief overview of the the new HPC environment and is intended for current Flux and Armis users.  We will use the temporary Beta HPC cluster to demonstrate how jobs will be submitted and managed under the new Great Lakes, Armis2, and Lighthouse clusters available later this year.

There are many differences between the familiar Flux environment and that of the new HPC clusters, including a new batch scheduling system, a new interactive batch job environment, a new HPC web portal, a new module environment, and a new on-demand-only job accounting system.

We will cover these differences in the workshop, and provide hands-on training in creating and running job submission scripts in the new HPC environment.  Students are expected to be conversant with the Linux command line and have experience in creating, submitting, and troubleshooting PBS batch scripts.

Introduction to the Great Lakes cluster and batch computing with Slurm

By |

OVERVIEW

This workshop will provide a brief overview of the components of the Great Lakes Cluster. The main body of the workshop will cover the resource manager and scheduler, creating submissions scripts to run jobs and the options available in them, and hands-on experience. By the end of the workshop, every participant should have created a submission script, submitted a job, tracked its progress, and collected its output. Participants will have several working examples from which to build their own submissions scripts in their own home directories.

PRE-REQUISITES

This course assumes familiarity with the Linux command line as might be got from the CSCAR/ARC-TS workshop Introduction to the Linux Command Line. In particular, participants should understand how files and folders work, be able to create text files using the nano editor, be able to create and remove files and folders, and understand what input and output redirection are and how to use them.

INSTRUCTORS

Dr. Charles J Antonelli
Research Computing Services
LSA Technology Services

Charles is a High Performance Computing Consultant in the Research Computing Services group of LSA TS at the University of Michigan, where he is responsible for high performance computing support and education, and was an Advocate to the Departments of History and Communications. Prior to this, he built a parallel data ingestion component of a novel earth science data assimilation system, a secure packet vault, and worked on the No. 5 ESS Switch at Bell Labs in the 80s. He has taught courses in operating systems, distributed file systems, C++ programming, security, and database application design.

John Thiels
Research Computing Services
LSA Technology Services

Mark Champe
Research Computing Services
LSA Technology Services

MATERIALS

COURSE PREPARATION

In order to participate successfully in the workshop exercises, you must have a user login, a Slurm account, and be enrolled in Duo. The user login allows you to log in to the cluster, create, compile, and test applications, and prepare jobs for submission. The Slurm account allows you to submit those jobs, executing the applications in parallel on the cluster and charging their resource use to the account. Duo is required to help authenticate you to the cluster.


USER LOGIN

If you already have a Flux user login, you don’t need to do anything.  Otherwise, go to the Flux user login application page at: https://arc-ts.umich.edu/fluxform/ .

Please note that obtaining a user account requires human processing, so be sure to do this at least two business days before class begins.


SLURM ACCOUNT

We create a Slurm account for the workshop so you can run jobs on the cluster during the workshop and for one day after for those who would like additional practice. The workshop job account is quite limited and is intended only to run examples to help you cement the details of job submission and management. If you already have an existing Slurm account, you can use that, though if there are any issues with that account, we will ask you to use the workshop account.


DUO AUTHENTICATION

Duo two-factor authentication is required to log in to the cluster. When logging in, you will need to type your UMICH (AKA Level 1) password as well as authenticate through Duo in order to access Great Lakes.

If you need to enroll in Duo, follow the instructions at Enroll a Smartphone or Tablet in Duo.

Please enroll in Duo before you come to class.

LAPTOP PREPARATION

You do not need to bring your own laptop to class. The classroom contains Windows or Mac computers, which require your uniqname and UMICH (AKA Level 1) password to login, and that have all necessary software pre-loaded.

If you want to use a laptop for the course, you are welcome to do so:  please see our web page on Preparing your laptop to use Flux. However, if there are problems connecting your laptop, you will be asked to switch to the provided computer for the class. We cannot stop to debug connection issues with personal or departmental laptops during the class.

If you are unable to attend the presentation in person we will be offering a link into the live course via BlueJeans. Please register as if attending in person.  This will put you on the wait list but we will get your account setup for remote attendance.

Great Lakes Update: March 2019

By | Flux, General Interest, Great Lakes, Happenings, HPC, News

ARC-TS previously shared much of this information through the December 2018 ARC Newsletter and on the ARC-TS website. We have added some additional details surrounding the timeline for Great Lakes as well as for users who would like to participate in Early User testing.

What is Great Lakes?

The Great Lakes service is a next generation HPC platform for University of Michigan researchers, which will provide several performance advantages compared to Flux. Great Lakes is built around the latest Intel CPU architecture called Skylake and will have standard, large memory, visualization, and GPU-accelerated nodes.  For more information on the technical aspects of Great Lakes, please see the Great Lakes configuration page.

Key Features:

  • Approximately 13,000 Intel Skylake Gold processors providing AVX512 capability providing over 1.5 TFlop of performance per node
  • 2 PB scratch storage system providing approximately 80 GB/s performance (compared to 8 GB/s on Flux)
  • New InfiniBand network with improved architecture and 100 Gb/s to each node
  • Each compute node will have significantly faster I/O via SSD-accelerated storage
  • Large Memory Nodes with 1.5 TB memory per node
  • GPU Nodes with NVidia Volta V100 GPUs (2 GPUs per node)
  • Visualization Nodes with Tesla P40 GPUs

Great Lakes will be using Slurm as the resource manager and scheduler, which will replace Torque and Moab on Flux. This will be the most immediate difference between the two clusters and will require some work on your part to transition from Flux to Great Lakes.

Another significant change is that we are making Great Lakes easier to use through a simplified accounting structure.  Unlike Flux where you need an account for each resource, on Great Lakes you can use the same account and simply request the resources you need, from GPUs to large memory.

There will be two primary ways to get access to compute time: 1) the on-demand model, which adds up the account’s job charges (reserved resources multiplied by the time used) and is billed monthly, similar to Flux On-Demand and 2) node purchases.  In the node purchase model, you will own the hardware which will reside in Great Lakes through the life of the cluster. You will receive an equivalent credit which you can use anywhere on the cluster, including on GPU and large memory nodes. We believe this will be preferable to buying actual hardware in the FOE model, as your daily computational usage can increase and decrease as your research requires. Send us an email at arcts-support@umich.edu if you have any questions or are interested in purchasing hardware on Great Lakes.

When will Great Lakes be available?

The ARC-TS team will prepare the cluster in April 2019 for an Early User period beginning in May, which will continue for approximately 4 weeks to ensure sufficient time to address any issues. General availability of Great Lakes should occur in June 2019.  We have a timeline for the Great Lakes project which will have more detail.

How does this impact me? Why Great Lakes?

After being the primary HPC cluster for the University for 8 years, Flux will be retired in September 2019.  Once Great Lakes becomes available to the University community, we will provide a few months to transition from Flux to Great Lakes.  Flux will be retired after that period due to aging hardware as well as expiring service contracts and licenses. We highly recommend preparing to migrate as early as possible so your research will not be interrupted.  Later in this email, we have suggestions for what you can do to make this migration process as easy as possible.

When Great Lakes becomes generally available to the University community, we will no longer be accepting new Flux accounts or allocations.  All new work should be focused on Great Lakes.

You can see the HPC timeline, including Great Lakes, Beta and Flux, here.

What is the current status of Great Lakes?

Today, the Great Lakes HPC compute hardware and high-performance Storage System has been fully installed and configured. In parallel with this work, the ARC-TS and Unit Support team members have been readying the new service with new software, modules as well as developing training to support the transition onto Great Lakes. A key feature of the new Great Lakes service is the just released HDR InfiniBand from Mellanox. Today, the hardware is installed but the firmware is still in its final stages of testing with the supplier with a target delivery of of mid-April 2019. Given the delays, ARC-TS and the suppliers have discussed an adjusted plan that allows quicker access to the cluster while supporting the future update once the firmware becomes available.

We are working with ITS Finance to define rates for Great Lakes.  We will update the Great Lakes documentation when we have final rates and let everyone know in subsequent communications.

What should I do to transition to Great Lakes?

We hope the transition from Flux to Great Lakes will be relatively straightforward, but to minimize disruptions to your research, we recommend you do your testing early.  In October 2018, we announced availability of the HPC cluster Beta in order to help users with this migration. Primarily, it allows users to migrate their PBS/Torque job submission scripts to Slurm.  You can and should also see the new Modules environments, as they have changed from their current configuration on Flux. Beta is using the same generation of hardware as Flux, so your performance will be similar to that on Flux. You should continue to use Flux for your production work; Beta is only to help test your Slurm job scripts and not for any production work.

Every user on Flux has an account on Beta.  You can login into Beta at beta.arc-ts.umich.edu.  You will have a new home directory on Beta, so you will need to migrate any scripts and data files you need to test your workloads into this new directory.  Beta should not be used for any PHI, HIPAA, Export Controlled, or any sensitive data!  We highly recommend that you use this time to convert your Torque scripts to Slurm and test that everything works as you would expect it to.  

To learn how to use Slurm, we have provided documentation on our Beta website.  Additionally, ARC-TS and academic unit support teams will be offering training sessions around campus. We will have a schedule on the ARC-TS website as well as communicate new sessions through Twitter and email.

If you have compiled software for use on Flux, we highly recommend that you recompile on Great Lakes once it becomes available.  Great Lakes is using the latest CPUs from Intel and by recompiling, your code may get performance gains by taking advantage of new capabilities on the new CPUs.

Questions? Need Assistance?

Contact arcts-support@umich.edu.

Beta cluster available for learning Slurm; new scheduler to be part of upcoming cluster updates

By | Flux, General Interest, Happenings, HPC, News

New HPC resources to replace Flux and updates to Armis are coming.  They will run a new scheduling system (Slurm). You will need to learn the commands in this system and update your batch files to successfully run jobs. Read on to learn the details and how to get training and adapt your files.

In anticipation of these changes, ARC-TS has created the test cluster “Beta,” which will provide a testing environment for the transition to Slurm. Slurm will be used on Great Lakes; the Armis HIPAA-aligned cluster; and a new cluster called “Lighthouse” which will succeed the Flux Operating Environment in early 2019.

Currently, Flux and Armis use the Torque (PBS) resource manager and the Moab scheduling system; when completed, Great Lakes and Lighthouse will use the Slurm scheduler and resource manager, which will enhance the performance and reliability of the new resources. Armis will transition from Torque to Slurm in early 2019.

The Beta test cluster is available to all Flux users, who can login via ssh at ‘beta.arc-ts.umich.edu’. Beta has its own /home directory, so users will need to create or transfer any files they need, via scp/sftp or Globus.

Slurm commands will be needed to submit jobs. For a comparison of Slurm and Torque commands, see our Torque to Slurm migration page. For more information, see the Beta home page.

Support staff from ARC-TS and individual academic units will conduct several in-person and online training sessions to help users become familiar with Slurm. We have been testing Slurm for several months, and believe the performance gains, user communications, and increased reliability will significantly improve the efficiency and effectiveness of the HPC environment at U-M.

The tentative time frame for replacing or transitioning current ARC-TS resources is:

  • Flux to Great Lakes, first half of 2019
  • Armis from Torque to Slurm, January 2019
  • Flux Operating Environment to Lighthouse, first half of 2019
  • Open OnDemand on Beta, which replaces ARC Connect for web-based job submissions, Jupyter Notebooks, Matlab, and additional software packages, fall 2018