ARC, LSA support groundbreaking global energy tracking

By | General Interest, Great Lakes, HPC, News, Research, Uncategorized

How can technology services like high-performance computing and storage help a political scientist contribute to more equal access to electricity around the world? 

Brian Min, associate professor of political science and research associate professor with the Center for Political Studies, and lead researcher Zachary O’Keeffe have been using nightly satellite imagery to generate new indicators of electricity access and reliability across the world as part of the High-Resolution Electricity Access (HREA) project. 

The collection of satellite imagery is unique in its temporal and spatial coverage. For more than three decades, images have captured nighttime light output over every corner of the globe, every single night. By studying small variations in light output over time, the goal is to identify patterns and anomalies to determine if an area is electrified, when it got electrified, and when the power is out. This work yields the highest resolution estimates of energy access and reliability anywhere in the world.

A satellite image of Kenya in 2017

This image of Kenya from 2017 shows a model-based classification of electrification status based upon all night statistically recalibrated 2017 VIIRS light output. (Image courtesy Dr. Min. Sources: NOAA, VIIRS DNB, Facebook/CIESIN HRSL).

LSA Technology Services and ARC both worked closely with Min’s team to relieve pain points and design highly-optimized, automated workflows. Mark Champe, application programmer/analyst senior, LSA Technology Services, explained that, “a big part of the story here is finding useful information in datasets that were created and collected for other purposes. Dr. Min is able to ask these questions because the images were previously captured, and then it becomes the very large task of finding a tiny signal in a huge dataset.”

There are more than 250 terabytes of satellite imagery and data, across more than 3 million files. And with each passing night, the collection continues to grow. Previously, the images were not easily accessible because they were archived in deep storage in multiple locations. ARC provides processing and storage at a single place, an important feature for cohesive and timely research. 

The research team created computational models that run on the Great Lakes High-Performance Computing Cluster, and that can be easily replicated and validated. They archive the files on the Locker Large-File Storage service

One challenge Min and O’Keeffe chronically face is data management. Images can be hundreds of megabytes each, so just moving files from the storage service to the high-performance computing cluster can be challenging, let alone finding the right storage service. Using Turbo Research Storage and Globus File Transfer, Min and O’Keeffe found secure, fast, and reliable solutions to easily manage their large, high-resolution files.

Brock Palen, director of ARC, said that top speeds were reached when moving files from Great Lakes to Turbo at 1,400 megabytes per second. 

Min and team used Globus extensively in acquiring historical data from the National Oceanic and Atmospheric Administration (NOAA). Champe worked with the research team to set up a Globus connection to ARC storage services. The team at NOAA was then able to push the data to U-M quickly and efficiently. Rather than uploading the data to later be downloaded by Min’s team, Globus streamlined and sped up the data transfer process. 

Champe noted, “Over 100TB of data was being unarchived from tape and transferred between institutions. Globus made that possible and much less painful to manage.”

“The support we’ve gotten from ARC and LSA Technology has been incredible. They have made our lives easier by removing bottlenecks and helping us see new ways to draw insights from this unique data,” said Min. 

Palen added, “We are proud to partner with LSA Technology Services and ITS Infrastructure networking services to provide support to Dr. Min’s and O’Keeffe’s work. Their work has the potential to have a big impact in communities around the world.” 

“We should celebrate work such as this because it is a great example of impactful research done at U-M that many people helped to support,” Champe continued.

Min expressed his gratitude to the project’s partners. “We have been grateful to work with the World Bank and NOAA to generate new insights on energy access that will hopefully improve lives around the world.”

These images are now available via open access (free and available to all)

This is made possible by a partnership between the University of Michigan, the World Bank, Amazon Web Services, and NOAA

DNA sequencing productivity increases with ARC-TS services

By | HPC, News, Research, Systems and Services
NovaSeq, the DNA sequencer that is about the size of large laser printer.

The Advanced Genomics Core’s Illumina NovaSeq 6000 sequencing platform. It’s about the size of large laser printer.

On the cutting-edge of research at U-M is the Advanced Genomics Core’s Illumina NovaSeq 6000 sequencing platform. The AGC is one of the first academic core facilities to optimize this exciting and powerful instrument, that is about the size of a large laser printer. 

The Advanced Genomics Core (AGC), part of the Biomedical Research Core Facilities within the Medical School Office of Research, provides high-quality, low-cost next generation sequencing analysis for research clients on a recharge basis. 

One NovaSeq run can generate as much as 4TB of raw data. So how is the AGC able to generate, process, analyze, and transfer so much data for researchers? They have partnered with Advanced Research Computing – Technology Services (ARC-TS) to leverage the speed and power of the Great Lakes High-Performance Computing Cluster

With Great Lakes, AGC can process the data, and then store the output on other ARC-TS services: Turbo Research Storage and Data Den Research Archive, and share with clients using Globus File Transfer. All three services work together. Turbo offers the capacity and speed to match the computational performance of Great Lakes, Data Den provides an archive of raw data in case of catastrophic failure, and Globus has the performance needed for the transfer of big data. 

“Thanks to Great Lakes, we were able to process dozens of large projects simultaneously, instead of being limited to just a couple at a time with our in-house system,” said Olivia Koues, Ph.D., AGC managing director. 

“In calendar year 2020, the AGC delivered nearly a half petabyte of data to our research community. We rely on the speed of Turbo for storage, the robustness of Data Den for archiving, and the ease of Globus for big data file transfers. Working with ARC-TS has enabled incredible research such as making patients resilient to COVID-19. We are proudly working together to help patients.”

“Our services process more than 180,000GB of raw data per year for the AGC. That’s the same as streaming the three original Star Wars movies and the three prequels more than 6,000 times,” said Brock Palen, ARC-TS director. “We enjoy working with AGC to assist them into the next step of their big data journey.”

ARC-TS is a division of Information and Technology Services (ITS). The Advanced Genomics Core (ACG) is part of the Biomedical Research Core Facilities (BRCF) within the Medical School Office of Research.

Turbo High Performance Research Storage grows 2PB and increases speed

By | General Interest, Happenings, HPC, News

Turbo Research Storage, the high performance research storage option available to researchers anywhere on campus, was recently expanded 2PB of new encrypted capacity. This new capacity allows Turbo to keep up with the growth of research data while also increasing performance with expanded caches and more network connectivity.

The work also increased Turbo’s performance to campus and ARC-TS resources by 50 percent to 60Gbps. A plan was also approved allowing for Turbo to grow to 160Gbps with room to 320 Gbps performance between Turbo and the newly announced HPC system Great Lakes.

HPC Maintenance

By | | No Comments

To accommodate upgrades to software and operating systems, Flux, Armis, and their storage systems (/home and /scratch) will be unavailable starting at 9am Saturday, January 7th, returning to service on Monday, January 9th.  Additionally, external Turbo mounts will be unavailable 11pm Saturday, January 7th, until 7am Sunday, January 8th.

During this time, the following updates are planned:

  • Operating system and software updates (minor updates) on Flux and Armis.  This should not require any changes to user software or processes.
  • Resource manager and job scheduling software updates.
  • Operating system updates on Turbo.

For HPC jobs, you can use the command “maxwalltime” to discover the amount of time before the beginning of the maintenance. Jobs that cannot complete prior to the beginning of the maintenance will be able to start when the clusters are returned to service.

We will post status updates on our Twitter feed ( https://twitter.com/arcts_um ) and send an email to all HPC users when the outage has been completed.