HPC maintenance scheduled for January 7 – 9

By | Flux, General Interest, News

To accommodate upgrades to software and operating systems, Flux, Armis, and their storage systems (/home and /scratch) will be unavailable starting at 9am Saturday, January 7th, returning to service on Monday, January 9th.  Additionally, external Turbo mounts will be unavailable 11pm Saturday, January 7th, until 7am Sunday, January 8th.

During this time, the following updates are planned:

  • Operating system and software updates (minor updates) on Flux and Armis.  This should not require any changes to user software or processes.
  • Resource manager and job scheduling software updates.
  • Operating system updates on Turbo.

For HPC jobs, you can use the command “maxwalltime” to discover the amount of time before the beginning of the maintenance. Jobs that cannot complete prior to the beginning of the maintenance will be able to start when the clusters are returned to service.

We will post status updates on our Twitter feed ( https://twitter.com/arcts_um ) and send an email to all HPC users when the outage has been completed.

ARC-TS HPC Maintenance

By |

Flux, Armis, Flux Hadoop, and their storage systems (/home, /scratch, and HDFS on Flux Hadoop) will be unavailable starting at Saturday, July 16 at 2 p.m., with a return to service targeted for mid-day July 22nd. During this time, ARC-TS will update several key systems. Among other improvements, the updates will provide access to more current versions of popular software and libraries, allow new features and more consistent runtimes for job scheduling, and migrate two-factor authentication for the login servers to a new system.

NOTE: With the University migrating to Duo from RSA for multifactor authentication in July, ARC-TS will switch to Duo for access to our login nodes during this maintenance period. (Units will be leading the switch to Duo with their faculty, staff and students who currently use MTokens. Questions about this change should be directed to IT or administrative leaders in units. More information can be found here:  http://www.itcs.umich.edu/identity/2factor/)

The updates will consist of:

  • OS and supporting software updates for the cluster. This will be a major update to the currently installed RedHat version (RHEL 6.6) moving to CentOS 7.1. This will provide newer versions of commonly used software and libraries, as well as helps us deliver more user-facing features in the coming months.
  • Cluster management software will be updated and reconfigured. This will include Torque 6, which has a new set of resource options. The new Torque version will give better language for defining tasks, more consistent runtimes, and a platform for new  features.
  • The Flux Hadoop environment will be updated to Cloudera 5.7, which now includes Hive-On-Spark.
  • /scratch on Flux will be updated and serviced.
  • The modules environment will transition from the current Environment Modules to a system called Lmod. The Lmod User Guide can be found here: https://www.tacc.utexas.edu/research-development/tacc-projects/lmod/user-guide.
    Many commands are the same, and we will document any significant differences.
  • The paths in which many software packages are installed will also change; e.g., folders like /home/software and /usr/cac will have new locations. This will also be documented.
  • Many default software versions will be changed and some older software packages and/or versions will be retired. In particular, OpenMPI and the compilers will all get updated to new versions.

Status updates will be posted on the ARC-TS Twitter feed  https://twitter.com/arcts_um.