What is Great Lakes?
The Great Lakes service is a next generation HPC platform for University of Michigan researchers. Great Lakes will provide several performance advantages compared to Flux, primarily in the areas of storage and networking. Great Lakes is built around the latest Intel CPU architecture called Skylake and will have standard, large memory, visualization, and GPU-accelerated nodes. For more information on the technical aspects of Great Lakes, please see the Great Lakes configuration page.
- Approximately 13,000 Intel Skylake Gold processors providing AVX512 capability providing over 1.5 TFlop of performance per node
- 2 PB scratch storage system providing approximately 80 GB/s performance (compared to 8 GB/s on Flux)
- New InfiniBand network with improved architecture and 100 Gb/s to each node
- Each compute node will have significantly faster I/O via SSD-accelerated storage
- Large Memory Nodes with 1.5 TB memory per node
- GPU Nodes with NVidia Volta V100 GPUs (2 GPUs per node)
- Visualization Nodes with Tesla P40 GPUs
Great Lakes will be using Slurm as the resource manager and scheduler, which will replace Torque and Moab on Flux. This will be the most immediate difference between the two clusters and will require some work on your part to transition from Flux to Great Lakes.
Another significant change is that we are making Great Lakes easier to use through a simplified accounting structure. Unlike Flux where you need an account for each resource, on Great Lakes you can use the same account and simply request the resources you need, from GPUs to large memory.
There will be two primary ways to get access to compute time: 1) the pay-as-you-go model similar to Flux On-Demand and 2) node purchases. Node purchases will give you computational time commensurate to 4 years multiplied by the number of nodes you buy. We believe this will be preferable to buying actual hardware in the FOE model, as your daily computational usage can increase and decrease as your research requires. Additionally you will not be limited by hardware failures on your specific nodes, as your jobs can run anywhere on Great Lakes. Send us an email at firstname.lastname@example.org if you have any questions or are interested in purchasing hardware on Great Lakes.
When will Great Lakes be available?
The ARC-TS team will prepare the cluster in February/March 2019 for an Early User period which will continue for several weeks to ensure sufficient time to address any issues. General availability of Great Lakes should occur in April.
How does this impact me? Why Great Lakes?
After being the primary HPC cluster for the University for 8 years, Flux will be retired in September 2019. Once Great Lakes becomes available to the University community, we will provide a few months to transition from Flux to Great Lakes. Flux will be retired after that period due to aging hardware as well as expiring service contracts and licenses. We highly recommend preparing to migrate as early as possible so your research will not be interrupted. Later in this email, we have suggestions for what you can do to make this migration process as easy as possible.
When Great Lakes becomes generally available to the University community, we will no longer be accepting new Flux accounts or allocations. All new work should be focused on Great Lakes.
What is the current status of Great Lakes?
Today, the Great Lakes HPC compute hardware has been fully installed and the high-performance Storage System configuration is in progress. In parallel with this work, the ARC-TS and Unit Support team members have been readying the new service with new software, modules as well as developing training to support the transition onto Great Lakes. A key feature of the new Great Lakes service is the just released HDR InfiniBand from Mellanox. Today, the hardware is available but the firmware is still in its final stages of testing with the supplier with a target delivery date of March (2019). Given the delays, ARC-TS and the suppliers have discussed an adjusted plan that allows quicker access to the cluster while supporting the future update once the firmware becomes available.
What should I do to transition to Great Lakes?
We hope the transition from Flux to Great Lakes will be relatively straightforward, but to minimize disruptions to your research, we recommend you do your testing early. In October, we announced availability of the HPC cluster Beta in order to help users with this migration. Primarily, it allows users to migrate their PBS/Torque job submission scripts to Slurm. You can also see the new Modules environments, as they have changed from their current configuration on Flux. Beta is using the same generation of hardware as Flux, so your performance will be similar to that on Flux. You should continue to use Flux for your production work; Beta is only to help test your Slurm job scripts and not for any production work.
Every user on Flux has an account on Beta. You can login into Beta at beta.arc-ts.umich.edu. You will have a new home directory on Beta, so you will need to migrate any scripts and data files you need to test your workloads into this new directory. Beta should not be used for any PHI, HIPAA, Export Controlled, or any sensitive data! We highly recommend that you use this time to convert your Torque scripts to Slurm and test that everything works as you would expect it to.
To learn how to use Slurm, we have provided documentation on our Beta website. Additionally, ARC-TS and academic unit support teams will be offering training sessions around campus. We’ll have a schedule on the ARC-TS website as well as communicate new sessions through Twitter and email.
If you have compiled software for use on Flux, we highly recommend that you recompile on Great Lakes once it becomes available. Great Lakes is using the latest CPUs from Intel and by recompiling, your code may get performance gains by taking advantage of new capabilities on the new CPUs.
Questions? Need Assistance?