Gordon Bell Prize winning team also leverages ITS services

By | Feature, General Interest, Great Lakes, HPC, News, Systems and Services

A U-M College of Engineering team led by Vikram Gavini was recently awarded the prestigious ACM Gordon Bell Prize. The honor was presented in recognition of their outstanding achievement for developing and demonstrating an approach that brought near-quantum mechanical accuracy for large systems consisting of tens of thousands of atoms into the range of today’s supercomputers. 

The ACM Gordon Bell Prize is awarded each year to recognize outstanding achievement in high-performance computing.

Dr. Gavini and team carried out their largest calculation on the fastest known computer in the world, the U.S. Department of Energy’s Frontier, and they sustained 660 petaflops. 

“The ACM Gordon Bell Prize is the most prestigious prize available for high-performance computing,” said Brock Palen, Advanced Research Computing (ARC) director, a division of Information and Technology Services (ITS). 

“This is so exciting because it is more than 660 times faster than our entire Great Lakes cluster at perfect efficiency. Their calculation was 10 times the improvement of any prior density-functional theory (DFT) calculation.” 

DFT is a method used in physics and chemistry to investigate the electronic structure of many-body systems. These systems can include atoms, molecules, or solids.  

“Dr. Gavini is a major user of ARC’s Great Lakes HPC Cluster, and we are so proud of this astonishing achievement.” 

Gavini said that any development needs to be carefully tested. “We use ARC’s Great Lakes to test smaller scale systems before moving to larger-scale, production-ready calculations. This testing is critical, and we need the accessibility and speed of Great Lakes to run calibrations and debug our implementations.”

“We use Great Lakes to actively test our implementations.”

For quick-access storage, Gavini and his team use Turbo Research Storage and Data Den Research Archive for longer-term storage. “We generate a lot of data, and storage is important to our work.”

The U-M Research Computing Package was also helpful in defraying some of their HPC and storage costs. 

“Thank you to ARC for providing their continual assistance for group members who have been in trenches. This has been a decade-long effort, and ITS/ARC was crucial along the journey,” said Gavini. 

Dr. Gavini is a professor of Mechanical Engineering and professor of Materials Science and Engineering, College of Engineering.

You’re invited: Parallel programming with MATLAB webinar on Dec. 4 

By | Events, HPC, News

We invite you to join us for an engaging virtual session on Parallel Computing with MATLAB, scheduled for December 4 from 1-4 p.m. EST. This session promises to equip you with valuable insights and knowledge. Here’s a glimpse of what you can expect to learn during the session.

Parallel Computing Hands-On Workshop:

Join us for an immersive hands-on workshop where we will introduce you to the world of parallel computing using MATLAB®. This workshop aims to equip you with the skills to tackle computationally and data-intensive problems by harnessing the power of multicore processors, GPUs, and computer clusters. Through practical exercises and real-world examples, you will gain a comprehensive understanding of parallel computing and learn best practices for its implementation.

Highlights:

  • Explore a range of exercises and examples, varying in difficulty from fundamental parallel usage concepts to more advanced techniques.
  • Learn how to optimize MATLAB applications by leveraging parallel computing capabilities.
  • Discover the benefits of running multiple Simulink simulations in parallel and enhance your simulation efficiency.
  • Dive into the world of GPU computing and unlock the potential for accelerated computations.
  • Explore the concept of offloading computations and delve into the realm of cluster computing.
  • Master the art of working with large data sets and efficiently process them using parallel computing techniques.

Don’t miss out on this opportunity to enhance your parallel computing skills with MATLAB. Join us for this exciting workshop and unlock the potential of parallel computing for your computational challenges.

Register soon to guarantee your spot and receive the Webex link before the workshop.

HPC Emergency 2023 Maintenance: September 15

By | Data, HPC, News, Research, Systems and Services

Due to a critical issue which requires an immediate update, we will be performing updates to Slurm and underlying libraries which allow parallel jobs to communicate. We will be updating the login nodes and the rest of the cluster on the fly and you should only experience minimal impact when interacting with the clusters. 

  • Jobs that are currently running will be allowed to finish. 
  • All new jobs will only be allowed to run on nodes which have been updated. 
  • The login and Open OnDemand nodes will also be updated, which will require a brief interruption in service.

Queued jobs and maintenance reminders

Jobs will remain queued, and will automatically begin after the maintenance is completed. Any parallel using MPI will fail; those jobs may need to be recompiled, as described below. Jobs not using MPI will not be affected by this update.

Jobs will be initially slow to start, as compute nodes are drained of running jobs so they can be updated. We apologize for this inconvenience, and want to assure you that we would not be performing this maintenance during a semester unless it was absolutely necessary.

Software updates

Only one version of OpenMPI (version 4.1.6) will be available; all other versions will be removed. Modules for the versions of OpenMPI that were removed will warn you that it is not available, as well as prompt you to load openmpi/4.1.6. 

When you use the following command, it will default to openmpi/4.1.6:
module load openmpi 

Any software packages you use (provided by ARC/LSA/COE/UMMS or yourself) will need to be updated to use openmpi/4.1.6. The software package updates will be completed by ARC. The code you compile yourself will need to be updated by you.

Note that at the moment openmpi/3.1.6 will be discontinued and warned to update your use to openmpi/4.1.6.

Status updates

 

System software changes

Great Lakes, Armis2 and Lighthouse

NEW version in BOLD OLD version
Slurm 23.02.5 compiles with:
  • PMIx
    • /opt/pmix/3.2.5
    • /opt/pmix/4.2.6
  • hwloc 2.2.0-3 (OS provided)
  • ucx-1.15.0-1.59056 (OFED provided)
  • slurm-libpmi
  • slurm-contribs
Slurm 23.02.3 compiles with:
  • PMIx
    • /opt/pmix/2.2.5
    • /opt/pmix/3.2.3
    • /opt/pmix/4.2.3
  • hwloc 2.2.0-3 (OS provided)
  • ucx-1.15.0-1.59056 (OFED provided)
  • slurm-libpmi
  • slurm-contribs
PMIx LD config /opt/pmix/3.2.5/lib PMIx LD config /opt/pmix/2.2.5/lib
PMIx versions available in /opt :
    • 3.2.5
    • 4.2.6
PMIx versions available in /opt :
  • 2.2.5
  • 3.2.3
  • 4.1.2
OpenMPI
  • 4.1.6
OpenMPI
  • 3.1.6
  • others

 

How can we help you?

For assistance or questions, contact ARC at arc-support@umich.edu.

Summer 2023 Network Maintenance: HPC and storage unavailable August 21-22 

By | Data, HPC, News, Research, Systems and Services

During the 2023 summer maintenance, a significant networking software bug was discovered and ARC was unable to complete the ARC HPC and Storage network updates at the MACC Data Center.

ITS has been working with the vendor on a remediation, and it will be implemented on August 21-22.  This will require scheduled maintenance for the HPC clusters Great Lakes, Armis2, and Lighthouse, as well as the ARC storage systems Turbo, Locker, and Data Den. The date was selected to minimize any impact during the fall semester. 

Maintenance dates:

HPC clusters and storage systems (/home and /scratch) and ARC storage systems (Turbo, Locker, and Data Den) will be unavailable August 21 starting at 7:00am.  Expected completion date is August 22nd.

Queued jobs and maintenance reminders

Jobs will remain queued, and will automatically begin after the maintenance is completed. The command “maxwalltime” will show the amount of time remaining until maintenance begins for each cluster, so you can size your jobs appropriately. The countdown to maintenance will also appear on the ARC homepage

Status updates

How can we help you?

For assistance or questions, contact ARC at arc-support@umich.edu.