Tag

big data

U-M partners with Cavium on Big Data computing platform

By | Feature, General Interest, Happenings, HPC, News

A new partnership between the University of Michigan and Cavium Inc., a San Jose-based provider of semiconductor products, will create a powerful new Big Data computing cluster available to all U-M researchers.

The $3.5 million ThunderX computing cluster will enable U-M researchers to, for example, process massive amounts of data generated by remote sensors in distributed manufacturing environments, or by test fleets of automated and connected vehicles.

The cluster will run the Hortonworks Data Platform providing Spark, Hadoop MapReduce and other tools for large-scale data processing.

“U-M scientists are conducting groundbreaking research in Big Data already, in areas like connected and automated transportation, learning analytics, precision medicine and social science. This partnership with Cavium will accelerate the pace of data-driven research and opening up new avenues of inquiry,” said Eric Michielssen, U-M associate vice president for advanced research computing and the Louise Ganiard Johnson Professor of Engineering in the Department of Electrical Engineering and Computer Science.

“I know from experience that U-M researchers are capable of amazing discoveries. Cavium is honored to help break new ground in Big Data research at one of the top universities in the world,” said Cavium founder and CEO Syed Ali, who received a master of science in electrical engineering from U-M in 1981.

Cavium Inc. is a leading provider of semiconductor products that enable secure and intelligent processing for enterprise, data center, wired and wireless networking. The new U-M system will use dual socket servers powered by Cavium’s ThunderX ARMv8-A workload optimized processors.

The ThunderX product family is Cavium’s 64-bit ARMv8-A server processor for next generation Data Center and Cloud applications, and features high performance custom cores, single and dual socket configurations, high memory bandwidth and large memory capacity.

Alec Gallimore, the Robert J. Vlasic Dean of Engineering at U-M, said the Cavium partnership represents a milestone in the development of the College of Engineering and the university.

“It is clear that the ability to rapidly gain insights into vast amounts of data is key to the next wave of engineering and science breakthroughs. Without a doubt, the Cavium platform will allow our faculty and researchers to harness the power of Big Data, both in the classroom and in their research,” said Gallimore, who is also the Richard F. and Eleanor A. Towner Professor, an Arthur F. Thurnau Professor, and a professor both of aerospace engineering and of applied physics.

Along with applications in fields like manufacturing and transportation, the platform will enable researchers in the social, health and information sciences to more easily mine large, structured and unstructured datasets. This will eventually allow, for example, researchers to discover correlations between health outcomes and disease outbreaks with information derived from socioeconomic, geospatial and environmental data streams.

U-M and Cavium chose to run the cluster on Hortonworks Data Platform, which is based on open source Apache Hadoop. The ThunderX cluster will deliver high performance computer services for the Hadoop analytics and, ultimately, a total of three petabytes of storage space.

“Hortonworks is excited to be a part of forward-leading research at the University of Michigan exploring low-powered, high-performance computing,” said Nadeem Asghar, vice president and global head of technical alliances at Hortonworks. “We see this as a great opportunity to further expand the platform and segment enablement for Hortonworks and the ARM community.”

Workshop co-chaired by MIDAS co-director Prof. Hero releases proceedings on inference in big data

By | Al Hero, Educational, General Interest, Research

The National Academies Committee on Applied and Theoretical Statistics has released proceedings from its June 2016 workshop titled “Refining the Concept of Scientific Inference When Working with Big Data,” co-chaired by Alfred Hero, MIDAS co-director and the John H Holland Distinguished University Professor of Electrical Engineering and Computer Science.

The report can be downloaded from the National Academies website.

The workshop explored four key issues in scientific inference:

  • Inference about causal discoveries driven by large observational data
  • Inference about discoveries from data on large networks
  • Inference about discoveries based on integration of diverse datasets
  • Inference when regularization is used to simplify fitting of high-dimensional models.

The workshop brought together statisticians, data scientists and domain researchers from different biomedical disciplines in order to identify new methodological developments that hold significant promise, and to highlight potential research areas for the future. It was partially funded by the National Institutes of Health Big Data to Knowledge Program, and the National Science Foundation Division of Mathematical Sciences.

Big Data: Improving the Scope, Quality and Accessibility of Financial Data

By |

The Office of Financial Research and the University of Michigan will host a joint conference, “Big Data: Improving the Scope, Quality, and Accessibility of Financial Data” in Ann Arbor, Michigan.  The conference will bring together a wide range of scholars, regulators, policymakers, and practitioners to explore how Big Data can be used to enhance financial stability and address other challenges in financial markets.

U-M telecast of XSEDE Big Data workshop

By |

XSEDE and the Pittsburgh Supercomputing Center are presenting a one day Big Data workshop. This workshop will focus on topics such as Hadoop and Spark. U-M is one of several sites around the country that will host a telecast of the session. Registration is required as space is limited.

Schedule:

11:00 Welcome
11:25 Intro to Big Data
11:45 Hadoop
12:15 Hadoop(continued)
1:00 Lunch break
2:00 Exercises
2:45 Spark
3:45 Exercises
4:15 A Big Big Data Platform
5:00 Adjourn

Dept. of Statistics sponsoring data mining competition for undergraduates

By | Educational, Events
The Department of Statistics is sponsoring a data mining competition that will be open to all U-M undergraduates.  Cash prizes will be awarded: $500/1st place, $300/2nd place, $200/3rd place.
Teams and individuals are welcome to participate A dataset will be made available on Monday, March 21st and submissions must be made by 9 a.m. on Monday, April 11th.
Each team will produce a written analysis of the provided dataset, focusing on a specific question that will be announced when the dataset is released.  A variety of tools and techniques will be suitable for this task — students from various academic backgrounds focusing on data analytics and computing will be well equipped for the competition.  See kshedden.github.io/data_mining_2016 for more information to be posted on March 21st.