The Yottabyte Research Cloud (YBRC) environment is capable of hosting many components of a scalable, on-demand data pipeline tools for research labs at U-M. These capabilities allow researchers to custom build data pipelines that can ingest data from a variety of sources, process them using a message bus service, and store them in a variety of databases for later analysis. Researchers have the choice of using any or all of our pipeline services, depending on their workflows.For example, researchers have used our data pipeline services to stream remote sensor data into YBRC to a Redis service.

We can utilize most common software tools at each step, or we can work with you to configure a tool of your choice. To explore options, contact arcts-support@umich.edu.

Capabilities available include:

  • Data ingestion components: Redis, Kafka, RabbitMQ.
  • Data processing engines: Apache Flink, Apache Storm, and Apache NiFi.
  • A variety of data stores and databases:
    • Structured databases: MySQL/MariaDB, and Postgres
    • NoSQL databases: Cassandra, InfluxDB, Grafana, and ElasticSearch

This diagram shows how data pipeline functions fit into the ARC data science offerings.

ybrc-pipeline