The Cavium/ThunderX Hadoop Cluster has been shut down.
The Cavium/ThunderX Hadoop Cluster was a next-generation Hadoop cluster available to U-M researchers. It was an on-campus resource that holds 3PB of storage for researchers to approach and analyze data science problems. The cluster consisted of 40 servers each containing 96 ARMv8 cores and 512GB of RAM per server. It was made possible through a partnership with Marvell.
What should I use instead of the Cavium/ThunderX Hadoop Cluster?
Cavium/ThunderX users are recommended to transition to the Great Lakes High Performance Computing Cluster. Visit the Great Lakes page for details and account creation instructions.
If you are not familiar with the U-M Research Computing Package, please check to see if your work is eligible for allocations of compute resources that are provided by ITS. On the UMRCP form, be sure to select the services you want (HPC, storage, sensitive, and non-sensitive), and add at least one user (lab manager, etc.).
If you used Spark or PySpark on Cavium/ThunderX, try the web-based Jupyter Notebook application on Great Lakes that provides Spark integration. Spark is available under the menu “Interactive Apps” > “Jupyter + Spark Basic.”
If you prefer running Spark batch jobs from the terminal rather than from a Jupyter Notebook, ARC has the Spark on HPC project that will demonstrate how to run a Spark cluster from a Slurm job.
The Twitter datasets have been made available on Great Lakes. Additional details about Twitter datasets are in the getting started guide.
For assistance or questions, please contact ARC at firstname.lastname@example.org.