Advanced ML topics: Algorithms, writing ML code, comparing implementations

By |

OVERVIEW

This workshop is designed as a follow-up to the basic introduction to machine learning earlier in this series. We will cover several examples in Python and compare different implementations. We will also look at advanced topics in machine learning, such as GPU optimization, parallel processing, and deep learning. A basic understanding of Python is required.

INSTRUCTORS

Meghan Richey
Machine Learning Specialist
Information and Technology Services – Advanced Research Computing – Technology Services

Meghan Richey is a machine learning specialist in the Advanced Research Computing- Technology Services department at the University of Michigan. She consults on several faculty and student machine learning applications and research studies, specializing in natural language processing and convolutional neural networks. Before her position at the university, Ms. Richey worked for a defense contractor as a software engineer to design and implement software solutions for DoD-funded artificial intelligence efforts.

A Zoom link will be provided to the participants the day before the class. Registration is required.

Instructor will be available at the Zoom link, to be provided, from 9-10 AM for computer setup assistance.

Please note, this session will be recorded.  

Register here

If you have questions about this workshop, please send an email to the instructor at richeym@umich.edu

ARC, LSA support groundbreaking global energy tracking

By | General Interest, Great Lakes, HPC, News, Research, Uncategorized

How can technology services like high-performance computing and storage help a political scientist contribute to more equal access to electricity around the world? 

Brian Min, associate professor of political science and research associate professor with the Center for Political Studies, and lead researcher Zachary O’Keeffe have been using nightly satellite imagery to generate new indicators of electricity access and reliability across the world as part of the High-Resolution Electricity Access (HREA) project. 

The collection of satellite imagery is unique in its temporal and spatial coverage. For more than three decades, images have captured nighttime light output over every corner of the globe, every single night. By studying small variations in light output over time, the goal is to identify patterns and anomalies to determine if an area is electrified, when it got electrified, and when the power is out. This work yields the highest resolution estimates of energy access and reliability anywhere in the world.

A satellite image of Kenya in 2017

This image of Kenya from 2017 shows a model-based classification of electrification status based upon all night statistically recalibrated 2017 VIIRS light output. (Image courtesy Dr. Min. Sources: NOAA, VIIRS DNB, Facebook/CIESIN HRSL).

LSA Technology Services and ARC both worked closely with Min’s team to relieve pain points and design highly-optimized, automated workflows. Mark Champe, application programmer/analyst senior, LSA Technology Services, explained that, “a big part of the story here is finding useful information in datasets that were created and collected for other purposes. Dr. Min is able to ask these questions because the images were previously captured, and then it becomes the very large task of finding a tiny signal in a huge dataset.”

There are more than 250 terabytes of satellite imagery and data, across more than 3 million files. And with each passing night, the collection continues to grow. Previously, the images were not easily accessible because they were archived in deep storage in multiple locations. ARC provides processing and storage at a single place, an important feature for cohesive and timely research. 

The research team created computational models that run on the Great Lakes High-Performance Computing Cluster, and that can be easily replicated and validated. They archive the files on the Locker Large-File Storage service

One challenge Min and O’Keeffe chronically face is data management. Images can be hundreds of megabytes each, so just moving files from the storage service to the high-performance computing cluster can be challenging, let alone finding the right storage service. Using Turbo Research Storage and Globus File Transfer, Min and O’Keeffe found secure, fast, and reliable solutions to easily manage their large, high-resolution files.

Brock Palen, director of ARC, said that top speeds were reached when moving files from Great Lakes to Turbo at 1,400 megabytes per second. 

Min and team used Globus extensively in acquiring historical data from the National Oceanic and Atmospheric Administration (NOAA). Champe worked with the research team to set up a Globus connection to ARC storage services. The team at NOAA was then able to push the data to U-M quickly and efficiently. Rather than uploading the data to later be downloaded by Min’s team, Globus streamlined and sped up the data transfer process. 

Champe noted, “Over 100TB of data was being unarchived from tape and transferred between institutions. Globus made that possible and much less painful to manage.”

“The support we’ve gotten from ARC and LSA Technology has been incredible. They have made our lives easier by removing bottlenecks and helping us see new ways to draw insights from this unique data,” said Min. 

Palen added, “We are proud to partner with LSA Technology Services and ITS Infrastructure networking services to provide support to Dr. Min’s and O’Keeffe’s work. Their work has the potential to have a big impact in communities around the world.” 

“We should celebrate work such as this because it is a great example of impactful research done at U-M that many people helped to support,” Champe continued.

Min expressed his gratitude to the project’s partners. “We have been grateful to work with the World Bank and NOAA to generate new insights on energy access that will hopefully improve lives around the world.”

These images are now available via open access (free and available to all)

This is made possible by a partnership between the University of Michigan, the World Bank, Amazon Web Services, and NOAA

Advanced ML topics: Algorithms, writing ML code, comparing implementations

By |

OVERVIEW

This workshop is designed as a follow-up to the basic introduction to machine learning earlier in this series. We will cover several examples in Python and compare different implementations. We will also look at advanced topics in machine learning, such as GPU optimization, parallel processing, and deep learning. A basic understanding of Python is required.

INSTRUCTORS

Meghan Richey
Machine Learning Specialist
Information and Technology Services – Advanced Research Computing – Technology Services

Meghan Richey is a machine learning specialist in the Advanced Research Computing- Technology Services department at the University of Michigan. She consults on several faculty and student machine learning applications and research studies, specializing in natural language processing and convolutional neural networks. Before her position at the university, Ms. Richey worked for a defense contractor as a software engineer to design and implement software solutions for DoD-funded artificial intelligence efforts.

A Zoom link will be provided to the participants the day before the class. Registration is required.

Instructor will be available at the Zoom link, to be provided, from 9-10 AM for computer setup assistance.

Please note, this session will be recorded.  

Register here

If you have questions about this workshop, please send an email to the instructor at richeym@umich.edu

Improving HPC IO with IME

By |

OVERVIEW

Supercomputers allow researchers to bring much more computational performance to their projects. One challenge of this additional performance is the increased rate that data may need to be accessed or written while the workloads are running.
This session will train users how to use the IME burst buffer on Great Lakes to reach IO performance potentially 10x greater than scratch or Turbo with their existing applications reaching over 80GBytes/s.
Requirements are basic command line and a Great Lakes user account.

INSTRUCTOR

Brock Palen
Director for Advanced Research Computing – Technology Services (ARC-TS)
Information and Technology Services – Advanced Research Computing – Technology Services

Brock is the director for Advanced Research Computing – Technology Services at the University of Michigan, where he is responsible for implementing the overall strategy of research computing infrastructure, including high-performance computing (HPC), high-throughput computing, research storage, big data (Hadoop, Spark), private/public cloud services, and consulting for researchers at the University of Michigan..

MATERIALS

A Zoom link will be provided to the participants the day before the class. Registration is required.

Please note, this session will be recorded.  

If you have questions about this workshop, please send an email to the instructor at brockp@umich.edu

 

Session Details


Location:
Your Desktop Remote
, Off Campus (view map) pop up map

 

Session level: All
Sponsor(s): Advanced Research Computing – Technology Services
Presenter(s): Brock Palen

 

 

This session is currently Open.
You are not enrolled.

 

Choosing Machine Learning Tools and Training

By |

OVERVIEW

With many languages, packages, and applications available to researchers, many wonder how to choose the correct path forward when starting a machine learning project. In this workshop, we will discuss the differences between several machine learning packages and applications that are best suited for each available language. A review of basic machine learning concepts, as well as a basic grasp of Python, is recommended.

INSTRUCTORS

Meghan Richey
Machine Learning Specialist
Information and Technology Services – Advanced Research Computing – Technology Services

Meghan Richey is a machine learning specialist in the Advanced Research Computing- Technology Services department at the University of Michigan. She consults on several faculty and student machine learning applications and research studies, specializing in natural language processing and convolutional neural networks. Before her position at the university, Ms. Richey worked for a defense contractor as a software engineer to design and implement software solutions for DoD-funded artificial intelligence efforts.

A Zoom link will be provided to the participants the day before the class. Registration is required.

Instructor will be available at the Zoom link, to be provided, from 9-10 AM for computer setup assistance.

Please note, this session will be recorded.  

Register here

If you have questions about this workshop, please send an email to the instructor at richeym@umich.edu

Data Sharing and Archiving

By |

OVERVIEW

For growing data volumes, how we manage data becomes more important. This session will cover the basics of managing data in a research environment such as those at ARC and nationally. Attendees of the course will be introduced to recommended tools for data sharing and transfer both on campus, off campus, and cloud.  They will learn how to prepare data for archive, including special high performance versions of tar and compression allowing significant performance benefits over the standard versions of the tools.
Lastly we will cover the properties and selection process of the appropriate general purpose  storage for data that requires long term preservation and active archiving that supports the largest data volumes in a way that controls costs and ease of management.
Requirements are basic command line.

INSTRUCTOR

Brock Palen
Director for Advanced Research Computing – Technology Services (ARC-TS)
Information and Technology Services – Advanced Research Computing – Technology Services

Brock is the director for Advanced Research Computing – Technology Services at the University of Michigan, where he is responsible for implementing the overall strategy of research computing infrastructure, including high-performance computing (HPC), high-throughput computing, research storage, big data (Hadoop, Spark), private/public cloud services, and consulting for researchers at the University of Michigan..

MATERIALS

A Zoom link will be provided to the participants the day before the class. Registration is required.

Please note, this session will be recorded.  

If you have questions about this workshop, please send an email to the instructor at brockp@umich.edu

Please register at https://ttc.iss.lsa.umich.edu/ttc/sessions/data-sharing-and-archiving-2/register/

Session Details


Location:
Your Desktop Remote
, Off Campus (view map) pop up map

 

Session level: All
Sponsor(s): Advanced Research Computing – Technology Services
Presenter(s): Brock Palen

 

 

This session is currently Open.
You are not enrolled.

 

Improving HPC IO with IME

By |

OVERVIEW

Supercomputers allow researchers to bring much more computational performance to their projects. One challenge of this additional performance is the increased rate that data may need to be accessed or written while the workloads are running.
This session will train users how to use the IME burst buffer on Great Lakes to reach IO performance potentially 10x greater than scratch or Turbo with their existing applications reaching over 80GBytes/s.
Requirements are basic command line and a Great Lakes user account.

INSTRUCTOR

Brock Palen
Director for Advanced Research Computing – Technology Services (ARC-TS)
Information and Technology Services – Advanced Research Computing – Technology Services

Brock is the director for Advanced Research Computing – Technology Services at the University of Michigan, where he is responsible for implementing the overall strategy of research computing infrastructure, including high-performance computing (HPC), high-throughput computing, research storage, big data (Hadoop, Spark), private/public cloud services, and consulting for researchers at the University of Michigan..

MATERIALS

A Zoom link will be provided to the participants the day before the class. Registration is required.

Please note, this session will be recorded.  

If you have questions about this workshop, please send an email to the instructor at brockp@umich.edu

 

Session Details


Location:
Your Desktop Remote
, Off Campus (view map) pop up map

 

Session level: All
Sponsor(s): Advanced Research Computing – Technology Services
Presenter(s): Brock Palen

 

 

This session is currently Open.
You are not enrolled.

 

Introduction to Machine Learning

By |

OVERVIEW

Machine learning is becoming an increasingly popular tool in several fields, including data science, medicine, engineering, and business. This workshop will cover basic concepts related to machine learning, including definitions of basic terms, sample applications, and methods for deciding whether your project is a good fit for machine learning. No prior knowledge or coding experience is required

INSTRUCTORS

Meghan Richey
Machine Learning Specialist
Information and Technology Services – Advanced Research Computing – Technology Services

Meghan Richey is a machine learning specialist in the Advanced Research Computing- Technology Services department at the University of Michigan. She consults on several faculty and student machine learning applications and research studies, specializing in natural language processing and convolutional neural networks. Before her position at the university, Ms. Richey worked for a defense contractor as a software engineer to design and implement software solutions for DoD-funded artificial intelligence efforts.

MATERIALS

A Zoom link will be provided to the participants the day before the class. Registration is required.

Instructor will be available at the Zoom link, to be provided, from 9-10 AM for computer setup assistance.

Please note, this session will be recorded.  

Register here

If you have questions about this workshop, please send an email to the instructor at richeym@umich.edu

Data Sharing and Archiving

By |

OVERVIEW

For growing data volumes, how we manage data becomes more important. This session will cover the basics of managing data in a research environment such as those at ARC and nationally. Attendees of the course will be introduced to recommended tools for data sharing and transfer both on campus, off campus, and cloud.  They will learn how to prepare data for archive, including special high performance versions of tar and compression allowing significant performance benefits over the standard versions of the tools.
Lastly we will cover the properties and selection process of the appropriate general purpose  storage for data that requires long term preservation and active archiving that supports the largest data volumes in a way that controls costs and ease of management.
Requirements are basic command line.

INSTRUCTOR

Brock Palen
Director for Advanced Research Computing – Technology Services (ARC-TS)
Information and Technology Services – Advanced Research Computing – Technology Services

Brock is the director for Advanced Research Computing – Technology Services at the University of Michigan, where he is responsible for implementing the overall strategy of research computing infrastructure, including high-performance computing (HPC), high-throughput computing, research storage, big data (Hadoop, Spark), private/public cloud services, and consulting for researchers at the University of Michigan..

MATERIALS

A Zoom link will be provided to the participants the day before the class. Registration is required.

Please note, this session will be recorded.  

If you have questions about this workshop, please send an email to the instructor at brockp@umich.edu

Please register at https://ttc.iss.lsa.umich.edu/ttc/sessions/data-sharing-and-archiving/register/

Session Details


Location:
Your Desktop Remote
, Off Campus (view map) pop up map

 

Session level: All
Sponsor(s): Advanced Research Computing – Technology Services
Presenter(s): Brock Palen

 

 

This session is currently Open.
You are not enrolled.

 

Advanced research computing on the Great Lakes Cluster

By |

OVERVIEW

This workshop will cover some more advanced topics in computing on the U-M Great Lakes Cluster. Topics to be covered include a review of common parallel programming models and basic use of Great Lakes; dependent and array scheduling; workflow scripting using bash; high-throughput computing using launcher; parallel processing in one or more of Python, R, and MATLAB; and profiling of parallel code using Allinea Performance Reports and Allinea MAP.

PRE-REQUISITES

This course assumes familiarity with the Linux command line as might be got from the CSCAR/ARC-TS workshop Introduction to the Linux Command Line. In particular, participants should understand how files and folders work, be able to create text files using the nano editor, be able to create and remove files and folders, and understand what input and output redirection are and how to use them.

INSTRUCTORS

Dr. Charles J Antonelli
Research Computing Services
LSA Technology Services

Charles is a member of the LSA Technology Services Research team at the University of Michigan, where he is responsible for high performance computing support and education, and was an Advocate to the Departments of History and Communications. Prior to this, he built a parallel data ingestion component of a novel earth science data assimilation system, a secure packet vault, and worked on the No. 5 ESS Switch at Bell Labs in the 80s. He has taught courses in operating systems, distributed file systems, C++ programming, security, and database application design.

John Thiels
Research Computing Services
LSA Technology Services

MATERIALS

COURSE PREPARATION

In order to participate successfully in the workshop exercises, you must have a user login, a Slurm account, and be enrolled in Duo. The user login allows you to log in to the cluster, create, compile, and test applications, and prepare jobs for submission. The Slurm account allows you to submit those jobs, executing the applications in parallel on the cluster and charging their resource use to the account. Duo is required to help authenticate you to the cluster.

USER LOGIN

If you already have a Great Lakes user login, you don’t need to do anything.  Otherwise, go to the Great Lakes user login application page at: http://arc-ts.umich.edu/login-request/ .

Please note that obtaining a user account requires human processing, so be sure to do this at least two business days before class begins.

SLURM ACCOUNT

We create a Slurm account for the workshop so you can run jobs on the cluster during the workshop and for one day after for those who would like additional practice. The workshop job account is quite limited and is intended only to run examples to help you cement the details of job submission and management. If you already have an existing Slurm account, you can use that, though if there are any issues with that account, we will ask you to use the workshop account.

DUO AUTHENTICATION

Duo two-factor authentication is required to log in to the cluster. When logging in, you will need to type your UMICH (AKA Level 1) password as well as authenticate through Duo in order to access Great Lakes.

If you need to enroll in Duo, follow the instructions at Enroll a Smartphone or Tablet in Duo.

Please enroll in Duo before you come to class.

LAPTOP PREPARATION

You will need VPN software to access the U-M network.  If you do not have VPN software already installed, please download and install the Cisco AnyConnect VPN software following these instructions.  You will need VPN to be able to use the ssh client to connect to Great Lakes. Please use the ‘Campus All traffic’ profile in the Cisco client.

You will need an ssh client to connect to the Great Lakes cluster. Mac OS X and Linux platforms have this built-in. Here are a couple of choices for Windows platforms:

  • Download and install U-M PuTTY/WinSCP from the Compute at the U website. This includes both the PuTTY ssh client and terminal emulator and a graphical file transfer tool in one installer.  This document describes how to download and use this software, except please note you will be connecting to greatlakes.arc-ts.umich.edu instead of the cited host.  You must have administrative authority over your computer to install this software.
  • Download PuTTY directly from the developer. Download the putty.exe application listed under “Alternative binary files,”, then execute the application.  You do not need administrative authority over your computer to use this software.

Our Great Lakes User Guide in Section 1.2 describes in more detail how to use PuTTY to connect to Great Lakes.

Please prepare and test your computer’s ability to make remote connections before class; we cannot stop to debug connection issues during the class.

A Zoom link will be provided to the participants the day before the class. Registration is required.Please note this session will be recorded.

 

Please register at https://ttc.iss.lsa.umich.edu/ttc/sessions/advanced-research-computing-on-the-great-lakes-cluster-8/register/