To accommodate updates to software, hardware, and operating systems, Flux, Armis, ConFlux, Flux Hadoop, and their storage systems (/home and /scratch) will be unavailable starting at 7 a.m. Tuesday, January 2nd and was expected to return on January 5th.  Flux, Armis, and ConFlux returned on January 4th, and Flux Hadoop is expected to return to service on Friday, January 5th or Monday, January 8th.  These updates will improve the performance and stability of ARC services.  We try to encapsulate the required changes into two maintenance periods per year and work to complete these tasks quickly, as we understand the impact of the maintenance on your research.

During this time, the following maintenance tasks are planned:

  • In-rack Uninterruptible Power Supply (UPS) replacements for all racks in the Modular Data Center (MDC) (Flux/Armis/Flux Hadoop)
  • Campus network hardware and software updates (Flux/Armis/Flux Hadoop)
  • InfiniBand networking updates (firmware and software) (Flux/Armis/ConFlux)
  • Operating system, compiler, and software updates (All clusters).
  • Resource manager and job scheduling software updates (All clusters).
  • Lmod default software version changes (Flux/Armis/ConFlux)
  • Hadoop ecosystem updates including migration from Cloudera 5.7 to Hortonworks Data Platform (HDP 2.6), Kerberos, WebHDFS, Apache Spark 2.x, SparkR,, Apache NiFi, Apache Zeppelin notebooks,  and support for Rstudio integration  (Flux Hadoop)
  • Migration of NFS volumes, including /home and software volumes, from MiStorage to Turbo (Flux/Flux Hadoop) for more consistent performance
    • To check your quota on /home, use flux-userquota -s
  • Update firmware and software of the Lustre file systems that provide /scratch (Flux)
  • Perform consistency checks on the Lustre file systems that provide /scratch (Flux)
  • Update Elastic Storage Server to 5.2 (ConFlux)
  • Update OS to mitigate against the Meltdown vulnerability (Flux/Armis)

For Flux HPC jobs, you can use the command “maxwalltime” to discover the amount of time remaining until the beginning of the maintenance. Jobs requesting more walltime than remains before the maintenance will be queued and started after the maintenance is completed.

For Flux Hadoop, persistent data that is stored in the Hadoop Distributed Filesystem (HDFS) will be destroyed during the upgrade. If you have persistent data that must be preserved and need assistance making arrangements to find a storage location for this data, please contact hpc-support@umich.edu as soon as possible. If you have not used Flux Hadoop in the last six months, you will need to request access to use the updated cluster.

The UPS replacement project was delayed and will be postponed until the spring. We’ll have more details about that later.

All Flux, Armis, ConFlux, and Flux Hadoop filesystems will be unavailable during the maintenance. We encourage you to copy any data that might be needed during that time from Flux prior to the start of the maintenance.

We will post status updates on our Twitter feed ( https://twitter.com/arcts_um ) throughout the course of the maintenance and send an email to all HPC and Hadoop users when the maintenance has been completed.  Please contact hpc-support@umich.edu if you have any questions.