Category

Systems and Services

HPC Emergency 2023 Maintenance: September 15

By | Data, HPC, News, Research, Systems and Services

Due to a critical issue which requires an immediate update, we will be performing updates to Slurm and underlying libraries which allow parallel jobs to communicate. We will be updating the login nodes and the rest of the cluster on the fly and you should only experience minimal impact when interacting with the clusters. 

  • Jobs that are currently running will be allowed to finish. 
  • All new jobs will only be allowed to run on nodes which have been updated. 
  • The login and Open OnDemand nodes will also be updated, which will require a brief interruption in service.

Queued jobs and maintenance reminders

Jobs will remain queued, and will automatically begin after the maintenance is completed. Any parallel using MPI will fail; those jobs may need to be recompiled, as described below. Jobs not using MPI will not be affected by this update.

Jobs will be initially slow to start, as compute nodes are drained of running jobs so they can be updated. We apologize for this inconvenience, and want to assure you that we would not be performing this maintenance during a semester unless it was absolutely necessary.

Software updates

Only one version of OpenMPI (version 4.1.6) will be available; all other versions will be removed. Modules for the versions of OpenMPI that were removed will warn you that it is not available, as well as prompt you to load openmpi/4.1.6. 

When you use the following command, it will default to openmpi/4.1.6:
module load openmpi 

Any software packages you use (provided by ARC/LSA/COE/UMMS or yourself) will need to be updated to use openmpi/4.1.6. The software package updates will be completed by ARC. The code you compile yourself will need to be updated by you.

Note that at the moment openmpi/3.1.6 will be discontinued and warned to update your use to openmpi/4.1.6.

Status updates

 

System software changes

Great Lakes, Armis2 and Lighthouse

NEW version in BOLD OLD version
Slurm 23.02.5 compiles with:
  • PMIx
    • /opt/pmix/3.2.5
    • /opt/pmix/4.2.6
  • hwloc 2.2.0-3 (OS provided)
  • ucx-1.15.0-1.59056 (OFED provided)
  • slurm-libpmi
  • slurm-contribs
Slurm 23.02.3 compiles with:
  • PMIx
    • /opt/pmix/2.2.5
    • /opt/pmix/3.2.3
    • /opt/pmix/4.2.3
  • hwloc 2.2.0-3 (OS provided)
  • ucx-1.15.0-1.59056 (OFED provided)
  • slurm-libpmi
  • slurm-contribs
PMIx LD config /opt/pmix/3.2.5/lib PMIx LD config /opt/pmix/2.2.5/lib
PMIx versions available in /opt :
    • 3.2.5
    • 4.2.6
PMIx versions available in /opt :
  • 2.2.5
  • 3.2.3
  • 4.1.2
OpenMPI
  • 4.1.6
OpenMPI
  • 3.1.6
  • others

 

How can we help you?

For assistance or questions, contact ARC at arc-support@umich.edu.

Summer 2023 Network Maintenance: HPC and storage unavailable August 21-22 

By | Data, HPC, News, Research, Systems and Services

During the 2023 summer maintenance, a significant networking software bug was discovered and ARC was unable to complete the ARC HPC and Storage network updates at the MACC Data Center.

ITS has been working with the vendor on a remediation, and it will be implemented on August 21-22.  This will require scheduled maintenance for the HPC clusters Great Lakes, Armis2, and Lighthouse, as well as the ARC storage systems Turbo, Locker, and Data Den. The date was selected to minimize any impact during the fall semester. 

Maintenance dates:

HPC clusters and storage systems (/home and /scratch) and ARC storage systems (Turbo, Locker, and Data Den) will be unavailable August 21 starting at 7:00am.  Expected completion date is August 22nd.

Queued jobs and maintenance reminders

Jobs will remain queued, and will automatically begin after the maintenance is completed. The command “maxwalltime” will show the amount of time remaining until maintenance begins for each cluster, so you can size your jobs appropriately. The countdown to maintenance will also appear on the ARC homepage

Status updates

How can we help you?

For assistance or questions, contact ARC at arc-support@umich.edu.

Secure Enclave Service rate approved, shortcode needed by July 25 

By | Data, News, Systems and Services, Uncategorized

The Yottabyte Research Cloud (YBRC) migrated to the Secure Enclave Services (SES) in 2022. The new system provides improved performance for researcher workloads. Due to this transition, ARC began billing researchers who consume more than 16 gigabytes (GB) of RAM (memory) per month on July 1, 2023. 

The first 16 GB of RAM (memory) is covered by the U-M Research Computing Package (UMRCP). If you have not already requested or been granted the UMRCP, learn more and request it on the UMRCP service page on the ARC website.

Approved rate 

The approved rate for a Secure Enclave Services machine is $7.00 per GB of RAM (memory) per machine, per month. Visit the Rates page on the ARC website for information about billing for all ARC services. 

Action requested: Submit a shortcode 

A shortcode is needed to accommodate billing for any resources consumed that are not covered by the UMRCP. Please submit a shortcode no later than July 25, 2023. Access to your machine will be removed or reduced if a shortcode is not on file by July 25. Contact us at arc-support@umich.edu to submit your shortcode, or make any changes to the configuration or use of your machines. 

Some schools and colleges (including the U-M Medical School) are subsidizing the use of Secure Enclave Services beyond the 16 GB of RAM (memory). Talk to your unit’s IT staff or email ARC to learn more. 

Contact ARC (arc-support@umich.edu) if you would like to meet with the ARC storage manager to ask questions or get clarification.

Globus can now be used with Armis2 

By | Armis2, HPC, News, Uncategorized

Researchers who have an Armis2 High-Performance Computing account can now move data to and from other Protected Health Information (PHI)-approved systems using Globus File Transfer. (The endpoint is umich#armis2.) 

To learn more about your responsibility and approved services, visit the Sensitive Data Guide and the Protected Health Information (PHI) webpage on the Safe Computing Website. Send an email to ARC at arc-support@umich.edu to get started using Globus with PHI on your own system (this is not needed for researchers using ARC services including Armis2, and Data Den and Turbo with Sensitive Data).

“With the addition of Globus on Armis2, researchers using ITS ARC services can use the same Globus tools and processes to securely and reliably move their data on all ARC systems and across the university and beyond,” said Matt Britt, ARC HPC systems manager.

Globus allows the transfer and collaborative access of data between different storage systems, lab computers, and personal desktops and laptops. Globus enables researchers to use a web browser to submit transfer and data synchronization requests between destinations. 

As a robust, cloud-based, file transfer service, Globus is designed to securely move your data, ranging from megabytes to petabytes. ARC is a Globus Subscription Provider for the U-M community, which allows U-M resources to serve as endpoints or collections for file transfers.

“There are many interesting research collaborations happening at U-M, as well as nationally and internationally. Globus can facilitate all of those interactions securely,” said Brock Palen, ARC director. “Globus is the go-to tool we recommend for data transfer.”

Learn more 

How can we help you?

For assistance or questions, contact ARC at arc-support@umich.edu.

ARC Summer 2023 Maintenance happening in June

By | HPC, News, Systems and Services

Summer maintenance will be happening earlier this year (June instead of August). Updates will be made to software, hardware, and operating systems to improve the performance and stability of services. ARC works to complete these tasks quickly to minimize the impact of the maintenance on research.

The dates listed below are the weeks the work will be occurring; the actual dates will be revised as planning continues.

HPC clusters and storage systems (/scratch) will be unavailable:

  • June 5-9: Great Lakes, Armis2, and Lighthouse

Storage systems will be unavailable:

  • June 6-7: Turbo, Locker, and Data Den

Queued jobs and maintenance reminders

Jobs will remain queued, and will automatically begin after the maintenance is completed. The command “maxwalltime” will show the amount of time remaining until maintenance begins for each cluster, so you can size your jobs appropriately. The countdown to maintenance will also appear on the ARC homepage

Status updates

How can we help you?

For assistance or questions, contact ARC at arc-support@umich.edu.

Globus maintenance happening at 9 a.m. on March 11

By | Armis2, Data, General Interest, Great Lakes, HPC, News, Research, Uncategorized

Due to planned maintenance by the vendor, Globus services will be unavailable for up to two hours beginning at 9 a.m. U.S. Eastern Time (10 a.m. Central Time) on Saturday, March 11, 2023.

Customers will not be able to authenticate or initiate any transfers during that time. Any transfers that have started before the outage will be stalled until the outage is over. Transfers will resume once maintenance is complete.

More details are available on the Globus blog.

For assistance or questions, please contact ARC at arc-support@umich.edu.

2023 Winter Maintenance & Globus File Transfer upgrade 

By | Feature, General Interest, Great Lakes, HPC, News, Systems and Services

Winter maintenance is coming up! See the details below. Reach out to arc-support@umich.edu with questions or if you need help. 

These services will be unavailable: 

  • Great Lakes – We will be updating Great Lakes on a rolling basis throughout December and beginning of January, and if successful, there should be no downtime or impact, with the following exceptions: 
    • Single precision GPUs (SPGPU) will be down Jan. 4-5 for networking maintenance. Those nodes will return back to production when maintenance has been completed and the nodes have been reloaded.
    • Customers will be notified via email of any changes to Great Lakes maintenance that will require downtime.
    • If unsuccessful, the Great Lakes maintenance will begin on Jan. 4-5, starting at 8am.  In either case, we will email everyone with the updated maintenance status.
  • Globus on the storage transfer nodes: Jan. 17-18.

Maintenance notes:

  • No downtime for ARC storage systems maintenance (Turbo, Locker, and Data Den).
  • Open OnDemand (OOD) users will need to re-login. Any existing jobs will continue to run and can be reconnected in the OOD portal.
  • Login servers will be updated, and the maintenance should not have any effect on most users. Those who are affected will be contacted directly by ARC. 
  • Copy any data and files that may be needed during maintenance to your local drive using Globus File Transfer before maintenance begins. 
  • Slurm email will be improved, providing  more detailed information about completed jobs.

Countdown to maintenance 

For Great Lakes HPC jobs, use the command “maxwalltime” to discover the amount of time remaining until maintenance begins. 

Jobs that request more walltime than remains until maintenance will automatically be queued and start once maintenance is complete. If the plan for Great Lakes maintenance is successful, any queued jobs will be able to run as usual (except for the SPGPU nodes as discussed above). Customers will be notified via email if downtime is required for Great Lakes.

Status updates and additional information

How can we help you?

For assistance or questions, please contact ARC at arc-support@umich.edu.

New Resource Management Portal feature for Armis2 HPC Clusters

By | Armis2, HPC, News

Advanced Research Computing (ARC), a division of Information and Technology Services (ITS), has been developing a self-service tool called the Resource Management Portal (RMP) to give researchers and their delegates the ability to directly manage the IT research services they consume from ARC. 

Customers who use the Armis2 High-Performance Computing Cluster now have the ability to view their account information via the RMP, including the account name, resource limits (CPUs and GPUs), and the user access list.

“We are proud to be able to offer this tool for customers who use the HIPAA-certified Armis2 cluster,” said Brock Palen, ARC director. 

The RMP is a self-service-only user portal with tools and APIs for research managers, unit support staff, and delegates to manage their ARC IT resources. The RMP team is slowly adding capabilities over time. 

To get started or find help, contact arc-support@umich.edu.

No-cost research computing allocations now available

By | HPC, News, Research, Systems and Services, Uncategorized

U-M Research Computing PackageResearchers on all university campuses can now sign up for the U-M Research Computing Package, a new package of no-cost supercomputing resources provided by Information and Technology Services.

As of Sept. 1, university researchers have access to a base allocation for 80,000 CPU hours of high-performance computing and research storage services at no cost. This includes 10 terabytes of high-speed and 100 terabytes of archival storage.

These base allocations will meet the needs of approximately 75 percent of current high-performance-computing users and 90 percent of current research storage users. Researchers must sign up on ITS’s Advanced Research Computing website to receive the allocation.

“With support from President (Mark) Schlissel and executive leadership, this initiative provides a unified set of resources, both on campus and in the cloud, that meet the needs of the rich diversity of disciplines. Our goal is to encourage the use, support and availability of high-performance computing resources for the entire research community,” said Ravi Pendse, vice president for information technology and chief information officer.

The computing package was developed to meet needs across a diversity of disciplines and to provide options for long-term data management, sharing and protecting sensitive data, and more competitive cost structures that give faculty and research teams more flexibility to procure resources on short notice.

“It is incredibly important that we provide our research community with the tools necessary so they can use their experience and expertise to solve problems and drive innovation,” said Rebecca Cunningham, vice president for research and the William G. Barsan Collegiate Professor of Emergency Medicine. “The no-cost supercomputing resources provided by ITS and Vice President Pendse will greatly benefit our university community and the countless individuals who are positively impacted by their research.”

Ph.D. students may qualify for their own UMRCP resources depending on who is overseeing their research and their adviser relationship. Students should consult with their Ph.D. program administrator to determine their eligibility. ITS will confirm this status when a UMRCP request is submitted.

Undergraduate and master’s students do not currently qualify for their own UMRCP, but they can be added as users or administrators of another person’s UMRCP. Students can also access other ITS programs such as Great Lakes for Course Accounts, and Student Teams.

“If you’re a researcher at Michigan, these resources are available to you without financial impact. We’re going to make sure you have what you need to do your research. We’re investing in you as a researcher because you are what makes Michigan Research successful,” Brock Palen, Advanced Research Computing director.

Services that are needed beyond the base allocation provided by the UMRCP are available at reduced rates and are automatically available for all researchers on the Ann Arbor, Dearborn, Flint and Michigan Medicine campuses.

More Information

Access the sensitive data HPC cluster via web browser

By | Armis2, HPC, News

Researchers, data scientists, and students can now more easily analyze sensitive data on the Armis2 High-Performance Computing (HPC) Cluster. No Linux knowledge required, just a web browser, an account, and a login. 

This is made possible by a web interface called Open OnDemand, and is provided by Advanced Research Computing (ARC). 

“It is now much easier to analyze sensitive data, without investing hours in training. This makes the Open OnDemand tool more accessible and user-friendly. I’m excited to see the research breakthroughs that happen now that a significant barrier has been removed,” said Matt Britt, ARC HPC manager. 

Open OnDemand offers easy file management, command-line access to the Armis2 HPC cluster, job management and monitoring, and graphical desktop environments and desktop interactive applications such as RStudio, MATLAB, and Jupyter Notebook.

Resource: Getting started (Web-based Open OnDemand) – section 1.2. For assistance or questions, please contact ARC at arc-support@umich.edu.

ARC is a division of Information and Technology Services (ITS).