HPC Updates (Great Lakes, Armis2, and Lighthouse)
Due to maintenance, the high-performance computing (HPC) clusters and their storage systems (/home and /scratch) will be unavailable:
- Great Lakes: Wednesday, January 4, 2023, 8am – Thursday, January 5, 2023, 5pm, for the following services:
- Single Precision GPU (spgpu) nodes (Jan 4)
- On-campus login node (Jan 4)
- Encore node (Jan 5)
- Armis2, Lighthouse: Monday, January 9, 2023, 8am – Wednesday, January 11, 2023, 5pm.
Copy any files you might need during the maintenance window to your local drive using Globus File Transfer.
Use the command “maxwalltime” to discover the amount of time remaining until maintenance begins at the command-line of any cluster login node. Jobs that request more walltime than remains until maintenance will automatically be queued and start once maintenance is complete.
Contact arc-support@umich.edu if you have any questions.
System hardware changes
Great Lakes
- Recabling the networking for the single-precision GPUs (SGPGU)
Armis2 and Lighthouse
- Annual preventative maintenance on the Modular Data Center (power will be out)
- Upgrades to the ethernet networking system
System software changes
NEW version in BOLD | OLD version |
Red Hat 8.4
|
Red Hat 8.4
|
Mlnx-ofa_kernel-modules
|
Mlnx-ofa_kernel-modules
|
Slurm 21.08.8-2 compiles with:
|
Slurm 21.08.8-2 compiles with:
|
PMIx LD config /opt/pmix/2.2.5/lib | PMIx LD config /opt/pmix/2.2.5/lib |
PMIx versions available in /opt :
|
PMIx versions available in /opt :
|
Singularity CE (Sylabs.io)
|
|
NVIDIA driver 520.61.05 | NVIDIA driver 510.73.08 |
Open OnDemand 2.0.29 | Open OnDemand 2.0.23-1 |
Slurm mail changes
We’re upgrading the Slurm mail package to give more information upon job completion.
Examples:
Old:
Subject: Slurm Job_id=45997734 Name=job.sbat Ended, Run time 00:00:31, COMPLETED, ExitCode 0 <no body text>
New:
Subject: "Job 54 (job.sbat) Ended" Job 54 Ended Created Wed, 21 Dec 2022 16:10:26 EST ------------------------------ Job Name : job.sbat Job ID : 54 User : cgbriggs Partition : standard Nodes Used : gls0004 Cores : 10 Job state : COMPLETED Exit Code : 0 Submit : 2022-12-21T16:09:54 Start : 2022-12-21T16:09:55 End : 2022-12-21T16:10:26 Res. Walltime : 01:00:00 Used Walltime : 00:00:31 Used CPU time : 00:00.026 % User (Comp) : 46.15% % System (I/O) : 53.85% Memory Requested : 11 GB Max Memory Used : 541 kB Max Disk Write : 0 B Max Disk Read : 32 kB ------------------------------ - TIP: Please consider lowering the amount of requested memory in the future, your job has consumed less than half of the requested memory. - TIP: Please consider lowering the amount of requested CPU cores in the future, your job has consumed less than half of the requested CPU cores
User software changes
Storage Updates (Turbo, Locker, and Data Den)
- Globus will be updated January 17-18