Consulting Services

By | Uncategorized

Advanced Research Computing (ARC), a division of ITS, is pleased to offer a pilot called Scientific Computing and Research Consulting Services to help researchers implement data analytics and workflows within their research projects. This includes navigating technical resources like high-performance computing and storage.

The ARC Scientific Computing and Research Consulting Services team will be your guide to navigating the complex technical world: from implementing intense data projects, to teaching you how the technical systems work, to assist in identifying proper tools, to guiding you on how to hire a programmer.

Areas of Expertise

  • Data Science
    • Data Workflows
    • Data Analytics
    • Machine Learning
    • Programming
  • Grant Proposals
    • Compute Technologies
    • Data Storage and Management
    • Budgeting cost for computing and storage
  • Scientific Computing/Programming
    • Getting started with advanced Computing
    • Code optimization
    • Parallel computing
    • GPU/Accelerator Programing
  • Additional Resources
    • Facilitating Collaborations/User Communities
    • Workshops and Training

Who can use this service?

  • All researchers and their collaborators from any of the university’s three campuses, including faculty, staff, and students
  • Units that want help including technical information when preparing grants
  • Anyone who has a need for HPC services and needs help navigating resources

How much does it cost?

  • Initial consultation, grant pre-work, and short term general guidance/feedback on methods and code are available at no cost.
  • For protracted longer engagements, research teams will be asked to contribute to the cost of providing the service.

Partnership

The ARC Scientific Computing and Research Consulting Services team works in partnership with the Consulting for Statistics, Computing, and Analytics Research team (CSCAR), Biomedical Research Core Facilities, and others. ARC may refer or engage complimentary groups as required by the project.

Get Started

Send an email to arc-consulting@umich.edu with the following information:

  • Research topic and goal
  • What you would like ARC to help you with
  • Any current or future data types and sources
  • Current technical resources
  • Current tools (programs, software)
  • Timeline – when do you need the help or information?

Get Help

If you have any questions or wish to setup a consult, please contact us at arc-consulting@umich.edu. Be sure to include as much information as possible from the “Get started” section noted above.

If you have more general questions about ARC services or software please contact us at arc-support@umich.edu

Virtual office hours are also available on Tuesdays, Wednesdays, and Thursdays. Get help with machine learning, algorithms, modeling, coding, computing on a cluster, survey sampling, using records across multiple surveys, and more. Anyone doing any type of research at any skill level is welcome!

Data Science Consulting

By | Uncategorized

Data Workflows

We are available to assist researchers along the entire lifecycle of the data workflow, from the conceptual stage to ingest, preprocessing, cleansing, and storage solutions. We can advise in the following areas:

  • Establishing and troubleshooting dataflows between systems
  • Selecting the appropriate systems for short-term and long-term storage
  • Transformation of raw data into structured formats
  • Data deduplication and cleansing
  • Conversion of data between different formats to aide in analysis
  • Automation of dataflow tasks

Analytics

The data science consulting team can assist with data analytics to support research:

  • Choosing the appropriate tools and techniques for performing analysis
  • Development of data analytics in a variety of frameworks
  • Cloud-based (Hadoop) analytic development

Machine Learning

Machine learning is an application of artificial intelligence (AI) that focuses on the development of computer programs to learn information from data.

We are available to consult on the following. This includes a general overview of concepts, discussion into what tools and architectures best fit your needs, or technical support on implementation.

Language Tools/Architectures Models
Python Python data tools (scikit, numpy, etc) Neural networks
C++ TensorFlow Decision trees
Java Jupyter notebooks Support vector machines
Matlab

Programming

We also provide consulting on programming in a variety of programming languages (including but not limited to: C++, Java, and Python) to support your data science needs. We can assist in algorithm design and implementation, as well as optimizing and parallelizing code to efficiently utilize high performance computing (HPC) resources where possible/necessary. We can help identify available commercial and open-source software packages to simplify your data analysis.

 

If you have any questions or wish to setup a consult please contact us at arcts-consulting@umich.edu

The ThunderX Cluster

By | Systems and Services

The Cavium/ThunderX Hadoop Cluster is a next-generation Hadoop cluster available to U-M researchers. The Cavium/ThunderX Cluster is an on-campus resource that holds 3PB of storage for researchers to approach and analyze data science problems. The cluster consists of 40 servers each containing 96 ARMv8 cores and 512GB of RAM per server. It is made possible through a partnership with Marvell. For more information or questions, contact arc-support@umich.edu.


Cavium/ThunderX Hadoop Cluster to shut down on August 1, 2022

On Monday, August 1, 2022, ARC will shut down the Cavium/ThunderX Hadoop Cluster, and all user access will be removed.

Researchers should plan to transition their work from the Cavium/ThunderX Hadoop Cluster to the Great Lakes High Performance Computing Cluster (or another service suitable for Spark and pySpark analysis) by Sunday, July 31, 2022. An archive of the data will be available.

Classes that are currently using the Cavium/ThunderX Hadoop Cluster for educational purposes should plan to use the Great Lakes High-Performance Computing Cluster for the Fall 2022 and Winter 2023 terms.

Actions that customers should take

  • If the Cavium/ThunderX Hadoop Cluster for coursework or big data is not used/active, no further action is required.
  • If a customer would like to continue using ARC services for Spark or pySpark data analysis for research, including the Twitter Decahose, review the data to determine what should migrate to the Great Lakes High-Performance Computing Cluster. Delete anything that does not need to migrate.
  • Want to migrate or download your data? Reach out to ARC for assistance, arc-support@umich.edu.
  • Data migration, if needed, should be completed by Sunday, July 31, 2022.

Sign up for the no-cost U-M Research Computing Package

Sign up for the U-M Research Computing Package to get no-cost allocations of 80,000 CPU hours on the Great Lakes High-Performance Computing Cluster, 10 TB of replicated Turbo Research Storage, and 100 TB of Data Den Research Archive storage. PhD students may qualify for their own UMRCP resources depending on who is overseeing their research and their advisor relationship. Students should consult with their PhD program administrator to determine their eligibility.

Use the Great Lakes HPC Cluster for teaching

Class accounts are available for teaching at no cost using the Great Lakes High-Performance Computing Cluster.

Why is this happening?

The service has reached the end of its useful life and is costly to replace. The Cavium/ThunderX Hadoop Cluster is over five years old, and replacement parts are costly and difficult to acquire. Please note that using Spark on the Great Lakes High-Performance Computing Cluster is faster than on the Cavium/ThunderX Hadoop Cluster for nearly all use cases.

Next steps

Researchers should plan to move their data off the Cavium/ThunderX Hadoop cluster by July  31. 

As part of this project, ARC will partner with current customers to assist them in the migration to the Great Lakes High-Performance Computing Cluster. Units and customers should expect to migrate by July 31, 2022, and the Cavium/ThunderX Hadoop Cluster will be shut down on August 1, 2022.

Twitter Decahose data will be made available on the Great Lakes High-Performance Computing Cluster in the same Locker Large-File Storage and Turbo Research Storage locations that they are today.

Need help?

For assistance or questions, please contact ARC at arc-support@umich.edu, or visit Virtual Drop-in Office Hours (CoderSpaces) for hands-on help, available 9:30-11 a.m. and 2-3:30 p..m. on Tuesdays; 1:30-3 p.m. on Wednesdays; and 2-3:30 p.m. on Thursdays.

For other topics, contact the ITS Service Center:

FAQs

  • I use the Cavium/ThunderX Hadoop Cluster to analyze Twitter data using the MIDAS Twitter Decahose. What should I do now?
    • Researchers who need to analyze Twitter data can continue to analyze that data using the Great Lakes High-Performance Computing Cluster. The complete Twitter datasets are available on the Great Lakes High-Performance Computing Cluster in the same directory locations as it was on the Cavium/ThunderX Hadoop Cluster.
  • Which ARC service replaces the Cavium/ThunderX Hadoop Cluster?
  • Why is ARC retiring the Cavium/ThunderX Hadoop Cluster?
    • The service has reached the end of its useful life and is costly to replace. The Cavium/ThunderX Hadoop Cluster is over five years old, and replacement parts are costly and difficult to acquire. The suggested replacement platform, the Great Lakes High-Performance Computing Cluster, can perform Spark and pySpark analyses faster than the Cavium/ThunderX Hadoop Cluster.
  • When is the service being shut down?
    • The Cavium/ThunderX Hadoop Cluster will shut down on Monday, August 1. Units should expect to migrate to a new service by June 31, 2022.
  • I need help migrating my data.
  • Will an archive of my data be available?
    • No. ARC will not provide an archive of your data after the machine has been decommissioned. However, if you forgot part of your dataset, there will be a period of six months before the Cavium/ThunderX Hadoop cluster is disassembled and sent to U-M Property Disposition. After that, the data will be removed and gone forever.
  • I used the Cavium/ThunderX Hadoop Cluster for a class but no longer need access. What should I do?
    • No further action is required.

Order Service

New requests are not being accepted at this time because the Cavium/ThunderX Hadoop Cluster will shut down on August 1, 2022. Contact ARC if you need assistance, arc-support@umich.edu.

Yottabyte Research Cloud powered by Verge.io

By | Systems and Services

yb-logoThe Yottabyte Research Cloud, powered by Verge.io, is a partnership between the ITS and Yottabyte that provides U-M researchers with high-performance, secure, and flexible computing environments enabling the analysis of sensitive data sets restricted by federal privacy laws, proprietary access agreements, or confidentiality requirements.

Upcoming maintenance

Yottabyte (Blue) to retire April 2022

Yottabyte (Blue) will retire on April 4, 2022. Yottabyte (Maize) for sensitive data will continue to be offered as a service. You can determine if your virtual server is hosted in ‘Blue’ if the hostname has ‘blue’ in its name, such as ‘yb-hostname.blue.ybrc.umich.edu.’

A member of the ARC Yottabyte team or someone from your unit will reach out to you to determine your needs and help you develop migration plans. 

What do I need to do

  • Review your data and projects that are currently utilizing Yottabyte (Blue). Delete anything you do not need to migrate.  
  • Make plans for an alternate hosting platform that you would like to move your virtual machine to, including these options provided by ITS:  
  • If you are no longer using your Yottabyte (Blue) virtual server or you are opting to shut it down without moving, please send an email to arc-support@umich.edu, and let us know that it can be decommissioned.

What is retiring? 

ARC’s Yottabyte (Blue) provides U-M researchers with high-performance, secure, and flexible computing environments through the following services:

  • Research Database Hosting, an environment that can house research-focused data stored in a number of different database engines.
  • Virtual desktops for research. This service is similar to Glovebox but is suitable for data that is not classified as sensitive.
  • Docker Container Service. This service can take any research application that can be containerized for deployment

When is YBRC ‘Blue’ retiring?

Yottabyte (Blue) will retire on April 4, 2022. Yottabyte (Maize) for sensitive data will continue to be offered as a service. 

Check your virtual server name to confirm that your virtual server is hosted in YBRC ‘Blue’ (the hostname has ‘blue’ in its name, such as ‘yb-hostname.blue.ybrc.umich.edu’).  

Why is YBRC ‘Blue’ retiring?

The YBRC ‘Blue’ services are underutilized making it difficult to build an affordable service with its current and projected user base.  

Where can I move my machine?

Researchers who are using these YBRC ‘Blue’ services for their work can leverage other services provided by Information & Technology Services. See below as possible replacements:

If you’re currently using: Try one of these services:
YBRC (Blue) Research Database Hosting MiDatabase
Docker Container Service MCloud Kubernetes Container Service
Virtual Desktops for Research MiServer

Please note: The above-mentioned services provided by ITS (MiDatabase, MCloud Kubernetes Service, and MiServer) all have costs associated with use. 

Not using your virtual machine? Let us know!

If you are not currently using or not planning to use your virtual machine, please let ARC know that machine can be decommissioned by sending us an email at arc-support@umich.edu. You can then remove your data as soon as feasible to prevent a rush leading up to Monday, April 4, 2022.    

Still using your virtual machine? We’ll reach out!

If you are currently using your machine, a member of your Unit IT group or a member of the ARC staff will reach out to you regarding the decommissioning of your system and work with you for alternate hosting options.  

More about Yottabyte Research Cloud

The system is built on Yottabyte’s composable, software-defined infrastructure platform called Cloud Composer, and represents U-M’s first use of software-defined infrastructure for research, allowing the on-the-fly personalized configuration of any-scale computing resources.

Cloud Composer software inventories the physical CPU, RAM and storage components of Cloud Blox appliances into definable and configurable virtual resource groups that may be used to build multi-tenant, multi-site infrastructure as a service.

See the September 2016 press release for more information.

The YBRC platform can accommodate sensitive institutional data classified up to High — including CUI — as identified in the Sensitive Data Guide.

Capabilities

The Yottabyte Research Cloud supports the following platforms for researchers at the University of Michigan:

  • Secure Enclave Services, an environment for the analysis of restricted data from HIPAA through Controlled Unclassified Information (CUI). This service is formerly known as Glovebox.

The following Services are deprecated in the current implementation of YBRC, and will not be replicated in future versions of the ARC Private Cloud environment. We are not taking new customers for any of these services.

  • Data Pipeline Tools, which include databases, message buses, data processing and storage solutions. This platform is suitable for sensitive institutional data classified up to High — including CUI, and data that is not classified as sensitive.
  • Research Database Hosting, an environment that can house research-focused data stored in a number of different database engines.
  • Virtual desktops for research. This service is similar to Glovebox but is suitable for data that is not classified as sensitive.
  • Docker Container Service. This service can take any research application that can be containerized for deployment.

Researchers who need to use Hadoop or Spark for data-intensive work should explore ARC’s separate Hadoop cluster.

Contact arc-support@umich.edu for more information.

Hardware

The system deploys 40 high performance Hyper-converged YottaBlox nodes (H2400i-E5), each consisting of two, Intel Xeon E5-2680V4 CPU (1,120 cores total), 512GB DDR4 2400MHz RAM (20,480GB total), dual port 40GbE network adapters (80 total) and (2) 800GB NVMe SSD DC P3700 drives (64TB); and 20 storage YottaBlox nodes (S2400i-E5-HDD), each consisting of two, Intel Xeon E5-2620V4 CPU (320 cores total), 128 GB DDR4 2133MHz RAM (2,560 GB total), quad port 10GbE network adapters (80 total),  (2) 800 GB DC S3610 SSD (32 TB total) and 12 x 6 TB 7200 RPM (1,440TB total).

Access

These tools are offered to all researchers at the University of Michigan free of charge, provided that certain usage restrictions are not exceeded. Large-scale users who outgrow the no-cost allotment may purchase additional YBRC resources. All interested parties should contact arcts-support@umich.edu.

Sensitive Data

The U-M Research Ethics and Compliance webpage on Controlled Unclassified Information provides details on handling this type of data. The U-M Sensitive Data Guide to IT Services is a comprehensive guide to sensitive data.

Order Service

The Yottabyte Research Cloud (Maize) is a pilot program available to all U-M researchers.

Access to Yottabyte Research Cloud (Maize) resources involves a single email to us at arc-support@umich.edu. Please include:

  • Your name or your advisor’s name
  • Your unit
  • Describe the Project that the VM will be used for, and the data that will be stored there.
  • Whether you plan to use restricted data.

Someone from your unit IT staff or an ARC IT staff member will reach out to you and arrange details to determine the best path to make your request work within the Yottabyte Cloud environment.

General Questions

What is the Yottabyte Research Cloud?

The Yottabyte Research Cloud (YBRC) is the University’s private cloud environment for research. It’s a collection of processors, memory, storage, and networking that can be subdivided into smaller units and allocated to research projects on an as-needed basis to be accessed by virtual machines and containers.

How do I get access to Yottabyte Research Cloud Resources?

Access to Yottabyte Research Cloud resources involves a single email to us at arcts-support@umich.edu. Please include:

  • Your name or your advisor’s name
  • Your unit
  • What you would like to use YBRC for
  • Whether you plan to use restricted data.

Someone from your unit IT staff or an ARC staff member will reach out to you and arrange details to determine the best path to make your request work within the Yottabyte Cloud environment.

What class of problems is Yottabyte Research Cloud designed to solve?

Yottabyte Research Cloud resources are aimed squarely at research and the teaching and training of students involved in research. Primarily, Yottabyte resources are for sponsored research. Yottabyte Research Cloud is not for administrative or clinical use (business of the university or the hospital). Clinical research is acceptable as long as it is sponsored research.

How large is the Yottabyte Research Cloud?

In total, Yottabyte Research Cloud (YBRC) has 960 processing cores for each Yottabyte cluster, 7.5 Terabytes, and roughly 330 TB of scratch storage available in Maize and Blue each.

What does Maize Yottabyte Research Cloud and Blue Yottabyte Research Cloud stand for?

Yottabyte resources are divided up between two clusters of computing and storage. Maize YBRC is for restricted data analyses and storage, and Blue YBRC is for unrestricted data analyses and storage.

What can I do with the Yottabyte Research Cloud?

The initial offering of YBRC is focused on a few different types of use cases:

  1. Database hosting and data ingestion of streaming data from an external source into a database. We can host many types of databases within Yottabyte, including most structured and unstructured databases.  Examples include MariaDB, PostgreSQL, and MongoDB.
  2. Hosting for applications that you can’t host locally in your lab or you would like to connect to our HPC and data science clusters, such as Material Studio, Galaxy, and SAS Studio.
  3. Hosting of Virtual Desktops and Servers for restricted data use cases, such as statistical analysis of health data, or an analytical project for Controlled Unsecured Information (CUI).  Most people in this case may need a powerful workstation for SAS, Stata or R analyses, for example, or some other application.

Are these the only things I can do with resources in the Yottabyte Research Cloud?

No! Contact us at arcts-support@umich.edu if you want to learn whether or not your idea can be done within YBRC!

How do I get help if I have an issue with something in Yottabyte?

The best way to get help is to send an email to arcts-support@umich.edu with a brief description of the issues you are seeing.

What are the support hours for the Yottabyte Research Cloud?

Yottabyte is supported between the hours of 9 a.m. to 5 p.m. Monday through Friday. Response times for support outside of these hours will be longer.

Usage Questions

What’s the biggest machine I can build within Yottabyte Research Cloud?

Because of the way that YBRC divides up resources, the largest Virtual Machine within the cluster is 12 processing cores, and 96 GB of RAM.

How many Yottabyte Research Cloud resources am I able to access at no cost?

ARC policy is to limit no-cost individual allocations to 100 cores, so that access is always open to multiple research groups.

What if I need more than the no-cost maximum?

If you need to use more than 100 cores of YBRC, we recommend that you purchase YBRC physical infrastructure of your own and add it to the cluster.  Physical infrastructure can be purchased in 96 physical core chunks, which can be oversubscribed as memory allows.  For every block purchased, the researcher will also receive 4 years of hardware and OS support for that block in the case of failure.  For a cost estimate of buying your own blocks of infrastructure and adding to the cluster, please email arcts-support@umich.edu.

What is ‘scratch’ storage?

Scratch storage for Yottabyte Research Cloud is the storage area network that OS storage and active data storage on the local virtual machines that are not actively being backed up or replicated to a separate infrastructure. Like the scratch storage on Flux, we don’t recommend storing any data solely on the local disk of any machines. Make sure that you have backups on other machines, like Turbo, Locker, or some other service.

HIPAA Compliance Questions

What can I do inside of an HIPAA network enclave?

For researchers with restricted data with a HIPAA classification, we provide a small menu of Linux and Windows workstations to be installed within your enclave.  We do not delegate administrative rights for those workstations to researchers or research staff.  We may delegate administrative rights for workstations and services in your enclaves to IT staff in your unit who have successfully completed the HIPAA IT training coursework given by ITS or HITS, and are familiar with desktop and virtual machine environments.

Machines in the HIPAA network enclaves are encircled by a deny first firewall that prevents most traffic from entering the enclaves.  Researchers can still visit external-to-campus websites from within a HIPAA network enclave.  Researchers within a HIPAA network enclave can use storage services such as Turbo and MiStorage Silver (via CIFS) to host data for longer-term storage.

What are a researcher and research group responsibilities when they have HIPAA data within YBRC?

All researchers, staff, and students that use YBRC when analyzing restricted data have a shared responsibility in keeping their restricted data secure.

  • Researchers need to be aware of the personnel in their labs who have access to the data in their enclaves.
    • Each lab should have a process for adding and removing users from enclaves that includes removing departed lab members from access to restricted data as soon as possible after they have left the lab.
    • Each lab should review who has access to their data and enclaves on a twice yearly basis via checking the memberships of their M-Community and Active Directory groups to ensure that people have been removed as requested.
  • Each lab user must store their restricted data in a specific directory, as discussed during their introductory meeting with YBRC staff.  They must keep the data only in this directory over the life of the data on the system.

CUI Compliance Questions

What can I do inside of a Secure Enclave Service CUI enclave?

Staff will work with researchers using CUI-classified data to determine the types of analysis that can be conducted on YBRC resources that comply with relevant regulations.

What are a researcher and research group responsibilities when they have CUI data within YBRC?

All researchers, staff, and students that use YBRC when analyzing restricted data have a shared responsibility in keeping their restricted data secure.

  • Researchers need to be aware of the personnel in their labs who have access to the data in their enclaves.
    • Each lab should have a process for adding and removing users from enclaves that includes removing departed lab members from access to restricted data as soon as possible after they have left the lab.
    • Each lab should review who has access to their data and enclaves on a twice yearly basis via checking the memberships of their M-Community and Active Directory groups to ensure that people have been removed as requested.
  • Each lab user must store their restricted data in a specific directory, as discussed during their introductory meeting with YBRC staff.  They must keep the data only in this directory over the life of the data on the system.

ConFlux

By | Systems and Services

conflux1-300x260ConFlux is a cluster that seamlessly combines the computing power of HPC with the analytical power of data science. The next generation of computational physics requires HPC applications (running on external clusters) to interconnect with large data sets at run time. ConFlux provides low latency communications for in- and out- of-core data, cross-platform storage, as well as high throughput interconnects and massive memory allocations. The file-system and scheduler natively handle extreme-scale machine learning and traditional HPC modules in a tightly integrated work flow—rather than in segregated operations—leading to significantly lower latencies, fewer algorithmic barriers and less data movement.

The ConFlux cluster is built with ~58 IBM Power8 CPU two-socket “Firestone” S822LC compute nodes providing 20 cores in each.  Seventeen Power8 CPU two-socket “Garrison” S822LC compute nodes provide an additional 20 cores and host four NVIDIA Pascal GPUs connected via NVIDIA’s NVLink technology to the Power8 system bus. Each GPU based node has a local high-speed NVMe flash memory for random access.

All compute and storage is connected via a 100Gb/s InfiniBand fabric. The IBM and NVLink connectivity, combined with IBM CAPI Technology provide an unprecedented data transfer throughput required for the data-driven computational physics researchers will be conducting.

ConFlux is funded by a National Science Foundation grant; the Principal Investigator is Karthik Duraisamy, Assistant Professor of Aerospace Engineering and Director of the Center for Data-Driven Computational Physics (CDDCP). ConFlux and the CDDCP are under the auspices of the Michigan Institute for Computational Discovery and Engineering.

Order Service

A portion of the cycles on ConFlux will be available through a competitive application process. More information will be posted as it becomes available.