The Cavium/ThunderX Hadoop Cluster is a next-generation Hadoop cluster available to U-M researchers. The Cavium/ThunderX Cluster is an on-campus resource that holds 3PB of storage for researchers to approach and analyze data science problems. The cluster consists of 40 servers each containing 96 ARMv8 cores and 512GB of RAM per server. It is made possible through a partnership with Marvell. For more information or questions, contact arc-support@umich.edu.


Cavium/ThunderX Hadoop Cluster to shut down on August 1, 2022

On Monday, August 1, 2022, ARC will shut down the Cavium/ThunderX Hadoop Cluster, and all user access will be removed.

Researchers should plan to transition their work from the Cavium/ThunderX Hadoop Cluster to the Great Lakes High Performance Computing Cluster (or another service suitable for Spark and pySpark analysis) by Sunday, July 31, 2022. An archive of the data will be available.

Classes that are currently using the Cavium/ThunderX Hadoop Cluster for educational purposes should plan to use the Great Lakes High-Performance Computing Cluster for the Fall 2022 and Winter 2023 terms.

Actions that customers should take

  • If the Cavium/ThunderX Hadoop Cluster for coursework or big data is not used/active, no further action is required.
  • If a customer would like to continue using ARC services for Spark or pySpark data analysis for research, including the Twitter Decahose, review the data to determine what should migrate to the Great Lakes High-Performance Computing Cluster. Delete anything that does not need to migrate.
  • Want to migrate or download your data? Reach out to ARC for assistance, arc-support@umich.edu.
  • Data migration, if needed, should be completed by Sunday, July 31, 2022.

Sign up for the no-cost U-M Research Computing Package

Sign up for the U-M Research Computing Package to get no-cost allocations of 80,000 CPU hours on the Great Lakes High-Performance Computing Cluster, 10 TB of replicated Turbo Research Storage, and 100 TB of Data Den Research Archive storage. PhD students may qualify for their own UMRCP resources depending on who is overseeing their research and their advisor relationship. Students should consult with their PhD program administrator to determine their eligibility.

Use the Great Lakes HPC Cluster for teaching

Class accounts are available for teaching at no cost using the Great Lakes High-Performance Computing Cluster.

Why is this happening?

The service has reached the end of its useful life and is costly to replace. The Cavium/ThunderX Hadoop Cluster is over five years old, and replacement parts are costly and difficult to acquire. Please note that using Spark on the Great Lakes High-Performance Computing Cluster is faster than on the Cavium/ThunderX Hadoop Cluster for nearly all use cases.

Next steps

Researchers should plan to move their data off the Cavium/ThunderX Hadoop cluster by July  31. 

As part of this project, ARC will partner with current customers to assist them in the migration to the Great Lakes High-Performance Computing Cluster. Units and customers should expect to migrate by July 31, 2022, and the Cavium/ThunderX Hadoop Cluster will be shut down on August 1, 2022.

Twitter Decahose data will be made available on the Great Lakes High-Performance Computing Cluster in the same Locker Large-File Storage and Turbo Research Storage locations that they are today.

Need help?

For assistance or questions, please contact ARC at arc-support@umich.edu, or visit Virtual Drop-in Office Hours (CoderSpaces) for hands-on help, available 9:30-11 a.m. and 2-3:30 p..m. on Tuesdays; 1:30-3 p.m. on Wednesdays; and 2-3:30 p.m. on Thursdays.

For other topics, contact the ITS Service Center:

FAQs

  • I use the Cavium/ThunderX Hadoop Cluster to analyze Twitter data using the MIDAS Twitter Decahose. What should I do now?
    • Researchers who need to analyze Twitter data can continue to analyze that data using the Great Lakes High-Performance Computing Cluster. The complete Twitter datasets are available on the Great Lakes High-Performance Computing Cluster in the same directory locations as it was on the Cavium/ThunderX Hadoop Cluster.
  • Which ARC service replaces the Cavium/ThunderX Hadoop Cluster?
  • Why is ARC retiring the Cavium/ThunderX Hadoop Cluster?
    • The service has reached the end of its useful life and is costly to replace. The Cavium/ThunderX Hadoop Cluster is over five years old, and replacement parts are costly and difficult to acquire. The suggested replacement platform, the Great Lakes High-Performance Computing Cluster, can perform Spark and pySpark analyses faster than the Cavium/ThunderX Hadoop Cluster.
  • When is the service being shut down?
    • The Cavium/ThunderX Hadoop Cluster will shut down on Monday, August 1. Units should expect to migrate to a new service by June 31, 2022.
  • I need help migrating my data.
  • Will an archive of my data be available?
    • No. ARC will not provide an archive of your data after the machine has been decommissioned. However, if you forgot part of your dataset, there will be a period of six months before the Cavium/ThunderX Hadoop cluster is disassembled and sent to U-M Property Disposition. After that, the data will be removed and gone forever.
  • I used the Cavium/ThunderX Hadoop Cluster for a class but no longer need access. What should I do?
    • No further action is required.

Order Service

New requests are not being accepted at this time because the Cavium/ThunderX Hadoop Cluster will shut down on August 1, 2022. Contact ARC if you need assistance, arc-support@umich.edu.