Networking updates for HPC (Great Lakes, Armis2, and Lighthouse) and Storage (Turbo, Locker, and Data Den)
During the 2023 summer maintenance, a significant networking software bug was discovered and we were unable to complete the ARC HPC and Storage network updates at the MACC data center. ITS has been working with the vendor, and we believe we will have a remediation which can be implemented on August 21-22. This update will require a scheduled maintenance for the HPC clusters Great Lakes, Armis2, and Lighthouse, the ARC storage systems Turbo, Locker, and Data Den, as well as the Research Manager Portal (RMP). The date was selected to minimize any impact during the fall semester. We endeavor to complete these tasks quickly, as we understand the impact of the maintenance on your research.
- Monday, August 21, 2023, 7am – Tuesday, August 22, 2023, 5pm
Copy any files you might need during the maintenance window to your local drive using Globus File Transfer.
Use the command “maxwalltime” to discover the amount of time remaining until maintenance begins at the command-line of any cluster login node. Jobs that request more walltime than remains until maintenance will automatically be queued and start once maintenance is complete.
Contact firstname.lastname@example.org if you have any questions.
The networking team will be remediating the software bug and upon validation that everything is working, Great Lakes, Turbo, Locker, and Data Den will all be migrated to the new campus network.
HPC hardware changes
- Recabling the ethernet network
- Networking updates to the InfiniBand gateway
Armis2 and Lighthouse
- Hardware updates to the heaters near the high voltage panels.
- InfiniBand module emergency update
HPC software changes
Great Lakes, Armis2 and Lighthouse
|NEW version in BOLD||OLD version|
|Slurm 23.02.3 compiles with:
||Slurm 23.02.1 compiles with:
Storage Updates (Turbo, Locker, and Data Den)
Turbo, Locker and Data Den will all be down as they are moved to the new network.