Consulting Services

By | Uncategorized

Advanced Research Computing (ARC), a division of ITS, is pleased to offer a pilot called Scientific Computing and Research Consulting Services to help researchers implement data analytics and workflows within their research projects. This includes navigating technical resources like high-performance computing and storage.

The ARC Scientific Computing and Research Consulting Services team will be your guide to navigating the complex technical world: from implementing intense data projects, to teaching you how the technical systems work, to assist in identifying proper tools, to guiding you on how to hire a programmer.

Areas of Expertise

  • Data Science
    • Data Workflows
    • Data Analytics
    • Machine Learning
    • Programming
  • Grant Proposals
    • Compute Technologies
    • Data Storage and Management
    • Budgeting cost for computing and storage
  • Scientific Computing/Programming
    • Getting started with advanced Computing
    • Code optimization
    • Parallel computing
    • GPU/Accelerator Programing
  • Additional Resources
    • Facilitating Collaborations/User Communities
    • Workshops and Training

Who can use this service?

  • All researchers and their collaborators from any of the university’s three campuses, including faculty, staff, and students
  • Units that want help including technical information when preparing grants
  • Anyone who has a need for HPC services and needs help navigating resources

How much does it cost?

  • Initial consultation, grant pre-work, and short term general guidance/feedback on methods and code are available at no cost.
  • For protracted longer engagements, research teams will be asked to contribute to the cost of providing the service.


The ARC Scientific Computing and Research Consulting Services team works in partnership with the Consulting for Statistics, Computing, and Analytics Research team (CSCAR), Biomedical Research Core Facilities, and others. ARC may refer or engage complimentary groups as required by the project.

Get Started

Send an email to with the following information:

  • Research topic and goal
  • What you would like ARC to help you with
  • Any current or future data types and sources
  • Current technical resources
  • Current tools (programs, software)
  • Timeline – when do you need the help or information?

Check out current or past projects

Get Help

If you have any questions or wish to setup a consult, please contact us at Be sure to include as much information as possible from the “Get started” section noted above.

If you have more general questions about ARC services or software please contact us at

Virtual office hours are also available on Tuesdays, Wednesdays, and Thursdays. Get help with machine learning, algorithms, modeling, coding, computing on a cluster, survey sampling, using records across multiple surveys, and more. Anyone doing any type of research at any skill level is welcome!

Data Science Consulting

By | Uncategorized

Data Workflows

We are available to assist researchers along the entire lifecycle of the data workflow, from the conceptual stage to ingest, preprocessing, cleansing, and storage solutions. We can advise in the following areas:

  • Establishing and troubleshooting dataflows between systems
  • Selecting the appropriate systems for short-term and long-term storage
  • Transformation of raw data into structured formats
  • Data deduplication and cleansing
  • Conversion of data between different formats to aide in analysis
  • Automation of dataflow tasks


The data science consulting team can assist with data analytics to support research:

  • Choosing the appropriate tools and techniques for performing analysis
  • Development of data analytics in a variety of frameworks
  • Cloud-based (Hadoop) analytic development

Machine Learning

Machine learning is an application of artificial intelligence (AI) that focuses on the development of computer programs to learn information from data.

We are available to consult on the following. This includes a general overview of concepts, discussion into what tools and architectures best fit your needs, or technical support on implementation.

Language Tools/Architectures Models
Python Python data tools (scikit, numpy, etc) Neural networks
C++ TensorFlow Decision trees
Java Jupyter notebooks Support vector machines


We also provide consulting on programming in a variety of programming languages (including but not limited to: C++, Java, and Python) to support your data science needs. We can assist in algorithm design and implementation, as well as optimizing and parallelizing code to efficiently utilize high performance computing (HPC) resources where possible/necessary. We can help identify available commercial and open-source software packages to simplify your data analysis.


If you have any questions or wish to setup a consult please contact us at

The ThunderX Cluster

By | Uncategorized

The Cavium/ThunderX Hadoop Cluster has been shut down. 

The Cavium/ThunderX Hadoop Cluster was a next-generation Hadoop cluster available to U-M researchers. It was an on-campus resource that holds 3PB of storage for researchers to approach and analyze data science problems. The cluster consisted of 40 servers each containing 96 ARMv8 cores and 512GB of RAM per server. It was made possible through a partnership with Marvell.

What should I use instead of the Cavium/ThunderX Hadoop Cluster?

Cavium/ThunderX users are recommended to transition to the Great Lakes High Performance Computing Cluster. Visit the Great Lakes page for details and account creation instructions.

If you are not familiar with the U-M Research Computing Package, please check to see if your work is eligible for allocations of compute resources that are provided by ITS. On the UMRCP form, be sure to select the services you want (HPC, storage, sensitive, and non-sensitive), and add at least one user (lab manager, etc.).

If you used Spark or PySpark on Cavium/ThunderX, try the web-based Jupyter Notebook application on Great Lakes that provides Spark integration. Spark is available under the menu “Interactive Apps” > “Jupyter + Spark Basic.”

If you prefer running Spark batch jobs from the terminal rather than from a Jupyter Notebook, ARC  has the Spark on HPC project that will demonstrate how to run a Spark cluster from a Slurm job.

The Twitter datasets have been made available on Great Lakes. Additional details about Twitter datasets are in the getting started guide.

Need help?

For assistance or questions, please contact ARC at

Order Service

New requests are not being accepted at this time because the Cavium/ThunderX Hadoop Cluster was shut down on August 1, 2022. Contact ARC if you need assistance,

Yottabyte Research Cloud powered by

By | Uncategorized

Yottabyte Research Cloud (YBRC) is no longer available. The YBRC service has been migrated to the Secure Enclave Service

YBRC was a partnership between Information and Technology Services (ITS) and Yottabyte/ from 2016 to 2022 that provided U-M researchers with high-performance, secure, and flexible computing environments enabling the analysis of sensitive data sets restricted by federal privacy laws, proprietary access agreements, or confidentiality requirements. 

Why migrate? 

  • Researcher workloads have evolved and need consistently tuned performance across the subsystems of the enclave, which became more difficult as the YBRC hardware aged
  • The initial funding for YBRC was provided, in part, by a grant that was coming to an end from the company behind the Yottabyte software
  • ITS currently offers a virtual server environment called MiServer for non-sensitive data

How can we help you? 

Contact ARC at for assistance.

Yottabyte Research Cloud (YBRC) is no longer available. The YBRC service has been migrated to the Secure Enclave Service


By | Uncategorized

conflux1-300x260ConFlux is a cluster that seamlessly combines the computing power of HPC with the analytical power of data science. The next generation of computational physics requires HPC applications (running on external clusters) to interconnect with large data sets at run time. ConFlux provides low latency communications for in- and out- of-core data, cross-platform storage, as well as high throughput interconnects and massive memory allocations. The file-system and scheduler natively handle extreme-scale machine learning and traditional HPC modules in a tightly integrated work flow—rather than in segregated operations—leading to significantly lower latencies, fewer algorithmic barriers and less data movement.

The ConFlux cluster is built with ~58 IBM Power8 CPU two-socket “Firestone” S822LC compute nodes providing 20 cores in each.  Seventeen Power8 CPU two-socket “Garrison” S822LC compute nodes provide an additional 20 cores and host four NVIDIA Pascal GPUs connected via NVIDIA’s NVLink technology to the Power8 system bus. Each GPU based node has a local high-speed NVMe flash memory for random access.

All compute and storage is connected via a 100Gb/s InfiniBand fabric. The IBM and NVLink connectivity, combined with IBM CAPI Technology provide an unprecedented data transfer throughput required for the data-driven computational physics researchers will be conducting.

ConFlux is funded by a National Science Foundation grant; the Principal Investigator is Karthik Duraisamy, Assistant Professor of Aerospace Engineering and Director of the Center for Data-Driven Computational Physics (CDDCP). ConFlux and the CDDCP are under the auspices of the Michigan Institute for Computational Discovery and Engineering.

Order Service

A portion of the cycles on ConFlux will be available through a competitive application process. More information will be posted as it becomes available.