Global research uses computing services to advance parenting and child development

By | General Interest, Great Lakes, HPC, News, Research, Uncategorized

Andrew Grogan-Kaylor, professor of Social Work, has spent the past 15 years studying the impact of physical discipline on children within the United States. 

Working with a team of other researchers at the School of Social Work, co-led by professors Shawna Lee and Julie Ma, he recently expanded his research to include children from all over the world, rather than exclusively the U.S. Current data for 62 low- and middle-income countries has been provided by UNICEF, a United Nations agency responsible for providing humanitarian and developmental aid to children worldwide. This data provides a unique opportunity to study the positive things that parents do around the world.

a group of smiling children

(Image by Eduardo Davad from Pixabay)

“We want to push research on parenting and child development in new directions. We want to do globally-based, diversity-based work, and we can’t do that without ARC services,” said Grogan-Kaylor. “I needed a bigger ‘hammer’ than my laptop provided.” 

The “hammer” he’s referring to is the Great Lakes HPC cluster. It can handle processing the large data set easily. When Grogan-Kaylor first heard about ARC, he thought it sounded like an interesting way to grow his science, and that included the ability to run more complicated statistical models that were overwhelming his laptop and department desktop computers. 

He took a workshop led by Bennet Fauber, ARC senior applications programmer/analyst, and found Bennet to be sensible and friendly. Bennet made HPC resources feel within reach to a newcomer. Typically, Grogan-Kaylor says, this type of resource is akin to learning a new language, and he’s found that being determined and persistent and finding the right people are key to maximizing ARC services. Bennet has explained error messages, how to upload data, and how to schedule jobs on Great Lakes. He also found a friendly and important resource at the ARC Help Desk, which is staffed by James Cannon. Lastly, departmental IT director Ryan Bankston has been of enormous help in learning about the cluster.

“We’re here to help researchers do what they do best. We can handle the technology, so they can solve the world’s problems,” said Brock Palen, ARC director. 

“Working with ARC has been a positive, growthful experience, and has helped me contribute significantly to the discussion around child development and physical punishment,” said Grogan-Kaylor. “I have a vision of where I’d like our research to go, and I’m pleased to have found friendly, dedicated people to help me with the pragmatic details.” 

More information

ARC, LSA support groundbreaking global energy tracking

By | General Interest, Great Lakes, HPC, News, Research, Uncategorized

How can technology services like high-performance computing and storage help a political scientist contribute to more equal access to electricity around the world? 

Brian Min, associate professor of political science and research associate professor with the Center for Political Studies, and lead researcher Zachary O’Keeffe have been using nightly satellite imagery to generate new indicators of electricity access and reliability across the world as part of the High-Resolution Electricity Access (HREA) project. 

The collection of satellite imagery is unique in its temporal and spatial coverage. For more than three decades, images have captured nighttime light output over every corner of the globe, every single night. By studying small variations in light output over time, the goal is to identify patterns and anomalies to determine if an area is electrified, when it got electrified, and when the power is out. This work yields the highest resolution estimates of energy access and reliability anywhere in the world.

A satellite image of Kenya in 2017

This image of Kenya from 2017 shows a model-based classification of electrification status based upon all night statistically recalibrated 2017 VIIRS light output. (Image courtesy Dr. Min. Sources: NOAA, VIIRS DNB, Facebook/CIESIN HRSL).

LSA Technology Services and ARC both worked closely with Min’s team to relieve pain points and design highly-optimized, automated workflows. Mark Champe, application programmer/analyst senior, LSA Technology Services, explained that, “a big part of the story here is finding useful information in datasets that were created and collected for other purposes. Dr. Min is able to ask these questions because the images were previously captured, and then it becomes the very large task of finding a tiny signal in a huge dataset.”

There are more than 250 terabytes of satellite imagery and data, across more than 3 million files. And with each passing night, the collection continues to grow. Previously, the images were not easily accessible because they were archived in deep storage in multiple locations. ARC provides processing and storage at a single place, an important feature for cohesive and timely research. 

The research team created computational models that run on the Great Lakes High-Performance Computing Cluster, and that can be easily replicated and validated. They archive the files on the Locker Large-File Storage service

One challenge Min and O’Keeffe chronically face is data management. Images can be hundreds of megabytes each, so just moving files from the storage service to the high-performance computing cluster can be challenging, let alone finding the right storage service. Using Turbo Research Storage and Globus File Transfer, Min and O’Keeffe found secure, fast, and reliable solutions to easily manage their large, high-resolution files.

Brock Palen, director of ARC, said that top speeds were reached when moving files from Great Lakes to Turbo at 1,400 megabytes per second. 

Min and team used Globus extensively in acquiring historical data from the National Oceanic and Atmospheric Administration (NOAA). Champe worked with the research team to set up a Globus connection to ARC storage services. The team at NOAA was then able to push the data to U-M quickly and efficiently. Rather than uploading the data to later be downloaded by Min’s team, Globus streamlined and sped up the data transfer process. 

Champe noted, “Over 100TB of data was being unarchived from tape and transferred between institutions. Globus made that possible and much less painful to manage.”

“The support we’ve gotten from ARC and LSA Technology has been incredible. They have made our lives easier by removing bottlenecks and helping us see new ways to draw insights from this unique data,” said Min. 

Palen added, “We are proud to partner with LSA Technology Services and ITS Infrastructure networking services to provide support to Dr. Min’s and O’Keeffe’s work. Their work has the potential to have a big impact in communities around the world.” 

“We should celebrate work such as this because it is a great example of impactful research done at U-M that many people helped to support,” Champe continued.

Min expressed his gratitude to the project’s partners. “We have been grateful to work with the World Bank and NOAA to generate new insights on energy access that will hopefully improve lives around the world.”

These images are now available via open access (free and available to all)

This is made possible by a partnership between the University of Michigan, the World Bank, Amazon Web Services, and NOAA

Using machine learning and the Great Lakes HPC Cluster for COVID-19 research

By | General Interest, Great Lakes, HPC, News, Research, Uncategorized

A researcher in the College of Literature, Science, and the Arts (LSA) is pioneering two separate, ongoing efforts for measuring and forecasting COVID-19: pandemic modeling and a risk tracking site

The projects are led by Sabrina Corsetti, a senior undergraduate student pursuing dual degrees in honors physics and mathematical sciences, and supervised by Thomas Schwarz, Ph.D., associate professor of physics. 

The modeling uses a machine learning algorithm that can forecast future COVID-19 cases and deaths. The weekly predictions are made using the ARC-TS Great Lakes High-Performance Computing Cluster, which provides the speed and dexterity to run the modeling algorithms and data analysis needed for data-informed decisions that affect public health. 

Each week, 51 processes (one for each state and one for the U.S.) are run in parallel (at the same time). “Running all 51 analyses on our own computers would take an extremely long time. The analysis places heavy demands on the hardware running the computations, which makes crashes somewhat likely on a typical laptop. We get all 51 done in the time it would take to do 1,” said Corsetti. “It is our goal to provide accurate data that helps our country.”

The predictions for the U.S. at the national and state levels are fed into the COVID-19 Forecasting Hub, which is led by the UMass-Amherst Influenza Forecasting Center of Excellence based at the Reich Lab. The weekly predictions generated by the hub are then read out by the CDC for their weekly forecast updates Center for Disease Control (CDC) COVID-19 Forecasting Hub

The second project, a risk tracking site, involves COVID-19 data-acquisition from a Johns Hopkins University repository and the Michigan Safe Start Map. This is done on a daily basis, and the process runs quickly. It only takes about five minutes, but the impact is great. The data populates the COVID-19 risk tracking site for the State of Michigan that shows by county the total number of COVID-19 cases, the average number of new cases in the past week, and the risk level.

“Maintaining the risk tracking site requires us to reliably update its data every day. We have been working on implementing these daily updates using Great Lakes so that we can ensure that they happen at the same time each day. These updates consist of data pulls from the Michigan Safe Start Map (for risk assessments) and the Johns Hopkins COVID-19 data repository (for case counts),” remarked Corsetti.

“We are proud to support this type of impactful research during the global pandemic,” said Brock Palen, director of Advanced Research Computing – Technology Services. “Great Lakes provides quicker answers and optimized support for simulation, machine learning, and more. It is designed to meet the demands of the University of Michigan’s most intensive research.”

ARC-TS is a division of Information and Technology Services (ITS). 

Related information 

Beta tool helps researchers manage IT services

By | General Interest, News, Research, Uncategorized

Since August 2019, ARC-TS has been developing a tool that would give researchers and their delegates the ability to directly manage the IT research services they consume from ARC-TS, such as user access and usage stats.

The ARC-TS Resource Management Portal (RMP) beta tool is now available for U-M researchers.

The RMP is a self-service-only user portal with tools and APIs for research managers, unit support staff, and delegates to manage their ARC-TS IT resources. Common activities such as managing user access (adding and removing users), viewing historical usage to make informed decisions about lab resource needs, and determining volume capacity at a glance are just some of the functionality the ARC-TS RMP provides.

The portal currently provides tools for use with Turbo Research Storage, a high-capacity, reliable, secure, and fast storage solution. Longer-term, RMP will scale to include the other storage and computing services offered by ARC-TS. It is currently read-view only.

To get started or find help, contact arcts-support@umich.edu.

Modular Data Center Electrical Work

By | Flux, Systems and Services, Uncategorized

[Update 2019-05-17 ] The MDC electrical work was completed successfully and Flux has been returned to full production.

 

The Modular Data Center (MDC), which houses Flux, Flux Hadoop, and other HPC resources, has an electrical issue which requires us to bring the power usage below 50% for the some racks in order to resolve the problem.  In order to do this, we have put reservations on some of the nodes to reduce the power draw so the issue can be fixed by ITS Data Centers.  Once we hit the target power level and the issue is resolved, we will remove the reservations and return Flux and Flux Hadoop back into full production level.

The 2018 MICDE Symposium: Summary by Bradley Dice, Ph.D student in Physics and Computational Science

By | Uncategorized

This piece was first published in LinkedIn by Bradley Dice, U-M Ph.D student in Physics and Computational Science.

MICDE Symposium 2018: Computation, A Pillar of Science and a Lens to the Future

High-performance computing (HPC) is becoming an increasingly powerful tool in the hands of scientists, driving new discoveries in physical sciences, life sciences, and social sciences. The development of new (frequently domain-specific) approaches to machine learning and faster, smarter processing of sets of Big Data allows us to explore questions that were previously impossible to study. Yesterday, I presented a poster at the Michigan Institute for Computational Discovery & Engineering (MICDE) annual Symposium and attended a number of talks by researchers working at the intersection of high-performance computing and their domain science. The theme for the symposium was “Computation: A Pillar of Science and a Lens to the Future.”

Collaborative Computational Science with signac

My scientific work, and the work of my colleagues in the Glotzer lab, has been made vastly more efficient through the use of tools for collaborative science, particularly the signac framework. I presented a poster about how the signac framework (composed of open-source Python packages signacsignac-flow, and signac-dashboard) enables scientists to rapidly simulate, model, and analyze data. The name comes from painter Paul Signac, who, along with Georges Seurat, founded the style of pointillism. This neo-impressionist style uses tiny dots of color instead of long brushstrokes, which collectively form a beautiful image when the viewer steps back. This metaphor fits the way that a lot of science works: given only points of data, scientists aim to see the whole picture and tell its story. Since our lab studies materials, our “points” of data fit into a multidimensional parameter space, where quantities like pressure and temperature, or even particles’ shapes, may vary. Using this data, our lab computationally designs novel materials from nanoparticles and studies the physics of complex crystalline structures.

The core signac package, which acts as a database on top of the file system, helps organize and manage scientific data and metadata. Its companion tool signac-flow enables users to quickly define “workflows” that run on supercomputing clusters, determining what operations to perform and submitting the jobs to the cluster for processing. Finally, signac-dashboard (which I develop) provides a web-based data visualization interface that allows users to quickly scan for interesting results and answer scientific questions. These tools include tutorials and documentation, to help users acquaint themselves and get on to doing science as quickly as possible. Importantly, the tools are not specific to materials science. Many scientific fields have similar questions, and the toolkit can easily be applied in fields where exploration or optimization within parameter spaces are common, ranging from fluid mechanics to machine learning.

During the symposium, I learned a lot about how others are using scientific computing in their own work. The symposium speakers came from a wide range of fields, including biology, mathematics, and fluid dynamics. Some of my favorite talks are described below.

The Past: Phylogeny and Uncovering Life’s Origins

High-performance computing is enabling scientists to look in all sorts of directions, including into the past. Stephen Smith, Assistant Professor of Ecology and Evolutionary Biology at the University of Michigan, talked about his lab’s research in detecting evolutionary patterns using genomic data. From the wealth of genetic data that scientists have collected, the Smith lab aims to improve our understanding of the “tree of life”: the overarching phylogenetic tree that can explain the progress of speciation over time. Projects like Open Tree of Life and PHLAWD, an open-source C++ project to process data from the National Center for Biotechnology Information’s GenBank data source, are just two of the ways that open science and big data are informing our understanding of life itself.

The Present: From Algebra to Autonomy

Cleve Moler, the original author of the MATLAB language and chief mathematician, chairman, and cofounder of MathWorks, spoke about his career and how the tools MATLAB has provided for numerical linear algebra (and many other computational tasks) have been important for the development of science and engineering over the last 34 years. MATLAB is taught to STEM students in many undergraduate curricula, and is used widely across industry to simulate and model the behavior of real systems. Features like the Automated System Driving Toolbox are poised to play a role in autonomous vehicles and the difficult computational tasks inherent in their operation.

The Future: Parallel-in-Time Predictions and Meteorology

A significant challenge in weather and climate modeling is that supercomputer architectures are highly parallel, while many simulations of fluids are inherently serial: each timestep must be computed before the next timestep can begin. Beth Wingate, Professor of Mathematics at the University of Exeter and published poet, is developing a powerful approach that may change the way that such models work. Called “parallel-in-time,” it separates the effects of slow dynamics and fast dynamics, enabling parallel architectures to take advantage of longer timesteps and separate the work across many processors.

Conclusions

Computational science is growing rapidly, improving our ability to address the most pressing questions and the mysteries of our world. As new supercomputing resources come online, such as Oak Ridge National Laboratories’ Summit, the promise of exascale computing is coming ever closer to reality. I look forward to what the next year of HPC will bring to our world.

Data science institutes at University of Michigan and University College London sign academic cooperation agreement

By | Uncategorized
From left, Al Hero, U-M; Patrick Wolfe, UCL; and Brian Athey, U-M signed an agreement for research and educational cooperation between the University of Michigan and University College London.

From left, Al Hero, U-M; Patrick Wolfe, UCL; and Brian Athey, U-M signed an agreement for research and educational cooperation between the University of Michigan and University College London.

ANN ARBOR, MI and LONDON — The Michigan Institute of Data Science (MIDAS) at the University of Michigan and the Centre for Data Science and Big Data Institute at UCL (University College London) have signed a five-year agreement of scientific and academic cooperation.

The agreement sets the stage for collaborative research projects between faculty of both institutions; student exchange opportunities; and visiting scholar arrangements, among other potential partnerships.

“There is a lot of common ground in what we do,” said Patrick Wolfe, Executive Director of UCL’s Centre for Data Science and Big Data Institute. “Both MIDAS and UCL cover the full spectrum of data science domains, from smart cities to healthcare to transportation to financial services, and both promote cross-cutting collaboration between scientific disciplines.”

Alfred Hero, co-director of MIDAS and professor of Electrical Engineering and Computer Science at U-M, said that one of the original goals of the institute when it was founded in 2015 under U-M’s $100 million Data Science Initiative was to reach out to U.S. and international partners.

“It seemed very natural that this would be the next step,” Hero said, adding that it would complement MIDAS’s recent partnership with the Shenzhen Research Institute of Big Data in China. “UCL epitomizes the collaboration, multi-disciplinarity and multi-institutional involvement that we’re trying to establish in our international partnerships.”

Wolfe visited Ann Arbor in early January to sign a memorandum of understanding along with Hero and Brian Athey, professor of bioinformatics and the other MIDAS co-director.

The agreement lists several potential areas of cooperation, including:

  • joint research projects
  • exchange of academic publications and reports
  • sharing of teaching methods and course design
  • joint symposia, workshops and conferences
  • faculty development and exchange
  • student exchange
  • exchange of visiting research scholars.

Links:

MIDAS at U-M

UCL Big Data Institute

Follow UCL’s data science activities @uclbdi

Follow MIDAS at @ARC_UM

HPC training workshops begin Tuesday, Jan. 31

By | Uncategorized

series of training workshops in high performance computing will be hed Jan. 31 through Feb. 22, 2017, presented by CSCAR in conjunction with Advanced Research Computing – Technology Services (ARC-TS). All sessions are held at East Hall, Room B254, 530 Church St.

Introduction to the Linux command Line
This course will familiarize the student with the basics of accessing and interacting with Linux computers using the GNU/Linux operating system’s Bash shell, also known as the “command line.”
Dates: (Please sign up for only one)
• Tuesday, Jan. 31, 12:30 – 3:30 p.m. (full descriptionregistration)
• Tuesday, Feb. 2, 9 a.m. – noon (full description | registration)
• Tuesday, Feb. 7, 9 a.m. – noon (full description | registration)

Introduction to the Flux cluster and batch computing
This workshop will provide a brief overview of the components of the Flux cluster, including the resource manager and scheduler, and will offer students hands-on experience.
Dates: (Please sign up for only one)
• Thursday, Feb. 9, 1 – 4:30 p.m. (full description | registration)
• Monday, Feb. 13, 1 – 4:30 p.m. (full description | registration)

Advanced batch computing on the Flux cluster
This course will cover advanced areas of cluster computing on the Flux cluster, including common parallel programming models, dependent and array scheduling, and a brief introduction to scientific computing with Python, among other topics.
Dates: (Please sign up for only one)
• Wednesday, Feb. 22, 9 a.m. – noon (full description | registration)
• Friday, Feb. 24, 9 a.m. – noon (full description | registration)

MIDAS announces second round of Data Science Challenge Initiative awards, in health and social science

By | Uncategorized

Five research projects — three in health and two in social science — have been awarded funding in the second round of the Michigan Institute for Data Science Challenge Initiative program.

The projects will receive funding from MIDAS as part of the Data Science Initiative announced in fall 2015.

The goal of the multiyear MIDAS Challenge Initiatives program is to foster data science projects that have the potential to prompt new partnerships between U-M, federal research agencies and industry. The challenges are focused on four areas: transportation, learning analytics, social science and health science. For more information, visit midas.umich.edu/challenges.

The projects, determined by a competitive submission process, are:

  • Title: Michigan Center for Single-Cell Genomic Data Analysis
    Description: The center will establish methodologies to analyze sparse data collected from single-cell genome sequencing technologies. The center will bring together experts in mathematics, statistics and computer science with biomedical researchers.
    Lead researchers: Jun Li, Department of Human Genetics; Anna Gilbert, Mathematics
    Research team: Laura Balzano, Electrical Engineering and Computer Science; Justin Colacino, Environmental Health Sciences; Johann Gagnon-Bartsch, Statistics; Yuanfang Guan, Computational Medicine and Bioinformatics; Sue Hammoud, Human Genetics; Gil Omenn, Computational Medicine and Bioinformatics; Clay Scott, Electrical Engineering and Computer Science; Roman Vershynin, Mathematics; Max Wicha, Oncology.
  • Title: From Big Data to Vital Insights: Michigan Center for Health Analytics and Medical Prediction (M-CHAMP)
    Description: The center will house a multidisciplinary team that will confront a core methodological problem that currently limits health research — exploiting temporal patterns in longitudinal data for novel discovery and prediction.
    Lead researchers: Brahmajee Nallamothu, Internal Medicine; Ji Zhu, Statistics; Jenna Wiens, Electrical Engineering and Computer Science; Marcelline Harris, Nursing.
    Research team: T. Jack Iwashyna, Internal Medicine; Jeffrey McCullough, Health Management and Policy (SPH); Kayvan Najarian, Computational Medicine and Bioinformatics; Hallie Prescott, Internal Medicine; Andrew Ryan, Health Management and Policy (SPH); Michael Sjoding, Internal Medicine; Karandeep Singh, Learning Health Sciences (Medical School); Kerby Shedden, Statistics; Jeremy Sussman, Internal Medicine; Vinod Vydiswaran, Learning Health Sciences (Medical School); Akbar Waljee, Internal Medicine.
  • Title: Identifying Real-Time Data Predictors of Stress and Depression Using Mobile Technology
    Description: Using an app platform that integrates signals from both mobile phones and wearable sensors, the project will collect data from over 1,000 medical interns to identify the dynamic relationships between mood, sleep and circadian rhythms. These relationships will be utilized to inform the type and timing of personalized data feedback for a mobile micro-randomized intervention trial for depression under stress.
  • Lead researchers: Srijan Sen, Psychiatry; Margit Burmeister, Molecular and Behavioral Neuroscience.
    Research team:  Lawrence An, Internal Medicine; Amy Cochran, Mathematics; Elena Frank, Molecular and Behavioral Neuroscience; Daniel Forger, Mathematics; Thomas Insel (Verily Life Sciences); Susan Murphy, Statistics; Maureen Walton, Psychiatry; Zhou Zhao, Molecular and Behavioral Neuroscience.
  • Title: Computational Approaches for the Construction of Novel Macroeconomic Data
    Description: This project will develop an economic dataset construction system that takes as input economic expertise as well as social media data; will deploy a data construction service that hosts this construction tool; and will use this tool and service to build an “economic datapedia,” a compendium of user-curated economic datasets that are collectively published online.
    Lead researcher: Matthew Shapiro, Department of Economics
    Research team: Michael Cafarella, Computer Science and Engineering; Jia Deng, Electrical Engineering and Computer Science; Margaret Levenstein, Inter-university Consortium for Political and Social Research.
  • Title: A Social Science Collaboration for Research on Communication and Learning based upon Big Data
    Description: This project is a multidisciplinary collaboration meant to introduce social scientists, computer scientists and statisticians to the methods and theories of engaging observational data and the results of structured data collections in two pilot projects in the area of political communication and one investigating parenting issues. The projects involve the integration of geospatial, social media and longitudinal data.
    Lead researchers: Michael Traugott, Center for Political Studies, ISR; Trivellore Raghunathan, Biostatistics
    Research team: Leticia Bode, Communications, Georgetown University; Ceren Budak, U-M School of Information; Pamela Davis-Keane, U-M Psychology, ISR; Jonathan Ladd, Public Policy, Georgetown; Zeina Mneimneh, U-M Survey Research Center; Josh Pasek, U-M Communications; Rebecca Ryan, Public Policy, Georgetown; Lisa Singh, Public Policy, Georgetown; Stuart Soroka, U-M Communications.

For more details, see the press releases on the social science and health science projects.

MIDAS to host faculty meeting on NSF BIGDATA solicitation

By | Uncategorized

The Michigan Institute for Data Science (MIDAS) will hold a faculty meeting at noon on Thursday, January 19 (Suite 7625, School of Public Health I, 1415 Washington Heights) for the NSF 17-534 “Critical Techniques, Technologies and Methodologies for Advancing Foundations and Applications of Big Data Sciences and Engineering (BIGDATA)” solicitation.

The meeting will include an overview of the NSF solicitation, U-M Data Science Resources (MIDAS, CSCAR, ARC-TS) available to faculty responding to the NSF call, and an opportunity to network with other faculty.

MIDAS has also arranged for Sylvia Spengler, NSF CISE Program Director, to be available at 1:30 pm to answer questions regarding the BIGDATA solicitation.

We invite you to participate in the faculty meeting to share your ideas and interest in responding to this BIGDATA solicitation as well as interact with other faculty looking to respond to this funding mechanism.

For those unable to participate in person, you can join virtually using GoToMeeting:

A box lunch will be provided at the faculty meeting.  Your RSVP (https://goo.gl/forms/OYAuB8mWCOlx3fw73) is appreciated.