Category

Uncategorized

Secure Enclave Service (SES) outage due to maintenance on March 23

By | News, Systems and Services, Uncategorized

Beginning at 6:30 a.m. on Saturday, March 23, through 5 p.m. on March 23, the Secure Enclave Services cluster will be shut down to allow for emergency maintenance on the underlying power distribution unit in the data center. This includes shutdowns for virtual machines hosted on the cluster.  To perform the upgrade, the entire cluster and all virtual machines will be offline on March 23.  

Status updates will be posted on the ITS Service Status page

How can we help you?

For assistance or questions, please contact ARC at arc-support@umich.edu.

For other topics, contact the ITS Service Center:

Using natural language processing to improve everyday life

By | Data, Great Lakes, HPC, News, Research, Uncategorized

Joyce Y. Chai, professor of electrical engineering and computer science, College of Engineering, and colleagues have been seeking answers to complex questions using natural language processing and machine learning that may improve everyday life.

Some of the algorithms that they develop in their work are meant for tasks that machines may have little to no prior knowledge of. For example, to guide human users to gain a particular skill (e.g., building a special apparatus or even, “Tell me how to bake a cake”). A set of instructions based on the observation of what the user is doing, e.g., to correct mistakes or provide the next step, would be generated by Generative AI, or GenAI. The better the data and engineering behind the AI, the more useful the instructions will be. 

“To enable machines to quickly learn and adapt to a new task, developers may give a few examples of recipe steps with both language instructions and video demonstrations. Machines can then (hopefully) guide users through the task by recognizing the right steps and generating relevant instructions using GenAI,” said Chai.

What are AI, machine learning, deep learning, and natural language processing?

It might help to take a step back to understand AI, machine learning (ML), and deep learning at a high level. Both ML and deep learning are subsets of AI, as seen in the figure. Some natural language processing (NLP) tasks fall within the realm of deep learning. They all work together and build off of each other.

Artificial Intelligence, or AI, is a branch of computer science that attempts to simulate human intelligence with computers. It involves creating systems to perform tasks that usually need human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.

“NLP is a sub-area in AI, where DL/ML approaches are predominantly applied,” stated Chai.

Christopher Brown, a research data scientist, with ITS Advanced Research Computing (ARC) and a member of the ARC consulting team, explains that ML is a subfield of AI. Within ML, algorithms are used to generalize situations beyond those seen in training data and then complete tasks without further guidance from people. A good example is U-M GPT. The large language models (LLMs) accessible via U-M GPT are trained with millions of diverse examples. “The goal is to train the models to reliably predict, translate, or generate something.” 

“Data is any information that can be formatted and fed into an algorithm that can be used for some task, including journal articles, chats, numbers, videos, audio, and texts,” said Brown, Algorithms can be trained to perform tasks using these real-world data. 

Natural Language Processing is a branch of artificial intelligence that helps computers understand and generate human language in a way that is both meaningful and useful to humans. NLP teaches computers to understand languages and then respond so that humans can understand, and even accounting for when rich context language is used. 

“NLP is highly interdisciplinary, and involves multiple fields, such as computer science, linguistics, philosophy, cognitive science, statistics, mathematics, etc.,” said Chai.

Examples of NLP are everywhere: when you ask Siri for directions, or when Google efficiently completes your half-typed query, or even when you get suggested replies in your email. 

Ultimately NLP, along with AI, can be used to make interactions between humans and machines as natural and as easy as possible. 

A lot of data is needed to train the models

Dr. Chai and her team use large language models, a lot of data, and computing resources. These models take longer to train and are harder to interpret. Brown says, “The state of the art, groundbreaking work tends to be in this area.” 

Dr. Chai uses deep learning algorithms that make predictions about what the next part of the task or conversation is. “For example, they use deep learning and the transformer architecture to enable embodied agents to learn how new words are connected to the physical environment, to follow human language instructions, and to collaborate with humans to come up with a shared plan,” Brown explains.

The technology that supports this work

To accomplish her work, Dr. Chai uses the Great Lakes High-Performance Computing Cluster and Turbo Research Storage, both of which are managed by U-M’s Advanced Research Computing Group (ARC) in Information and Technology Services. She has 16 GPUs on Great Lakes at the ready, with the option to use more at any given time. 

A GPU, or Graphics Processing Unit, is a piece of computer equipment that is good at displaying pictures, animations, and videos on your screen. The GPU is especially adept at quickly creating and manipulating images. Traditionally, GPUs were used for video games and professional design software where detailed graphics were necessary. But more recently, researchers including Dr. Chai discovered that GPUs are also good at handling many simple tasks at the same time. This includes tasks like scientific simulations and AI training where a lot of calculations need to be done in parallel (which is perfect for training large language models).

“GPUs are popular for deep learning, and we will continue to get more and better GPUs in the future. There is a demand, and we will continue supporting this technology so that deep learning can continue to grow,” said Brock Palen, ITS Advanced Research Computing director. 

Chai and her team also leveraged 29 terabytes of the Turbo Research Storage service at ARC. NLP benefits from the high-capacity, reliable, secure, and fast storage solution. Turbo enables investigators across the university to store and access data needed for their research via Great Lakes. 

Great Lakes HPC in the classroom 

ARC offers classroom use of high-performance computing cluster resources on the Great Lakes High-Performance Computing Cluster

Dr. Chai regularly leverages this resource. “Over 300 students have benefited from this experience. We have homework that requires the use of the Great Lakes, e.g., having students learn how to conduct experiments in a managed job-scheduling system like SLURM. This will benefit them in the future if they engage in any compute-intensive R&D (research and development).

“For my NLP class, I request Great Lakes access for my students so they have the ability to develop some meaningful final projects. We also use the Great Lakes HPC resources to study the reproducibility for NLP beginners,” said Chai. A gallery is available for many of the student projects.

The UMRCP defrays costs

The U-M Research Computing Package is a set of cost-sharing allocations offered by ITS ARC, and they are available upon request. Other units offer additional cost-sharing to researchers. Chai said, “We typically use the nodes owned by my group for research projects that require intensive, large-scale GPU model training. We use the UMRCP for less intensive tasks, thereby extending the budgetary impact of the allocations.” 

New GPU Offering

By | Uncategorized

We are pleased to introduce NVIDIA’s Multi-Instance GPU (MIG) technology to Great Lakes, featuring a total of 16 GPUs. This development promises to optimize your computing experience.

How MIG Works:

NVIDIA’s Multi-Instance GPU (MIG) technology divides our 8 GPUs into 16 multiple isolated instances, each behaving as an independent GPU with dedicated compute resources. This partitioning allows for efficient allocation of GPU resources, enhancing your computing experience. 

Key Benefits and Limitations:

  • Efficient Resource Allocation: MIG’s partitioning ensures that your tasks receive dedicated GPU resources, avoiding resource contention and enhancing efficiency.
  • Enhanced Scalability: Run multiple GPU workloads concurrently without conflicts, simplifying project scaling.
  • Flexibility: Customize GPU instances to match your application requirements, optimizing performance and cost-effectiveness.
  • MIG is only intended for single-slice jobs, and a single process cannot run across multiple devices. Slurm will only allow jobs requesting a single GPU.

Getting Started:

To access the nodes equipped with MIG technology, use the Slurm partition called “gpu_mig40” when submitting your job requests. The partition is named with the amount of memory each GPU has available.

Important Notice: Please be aware that each job can only utilize a single GPU. If your jobs needs more that 1 GPU, use the “gpu” or “spgpu” partition, depending on your GPU needs.

Example Slurm Job Submission:

sbatch –partition=gpu_mig40 –gres=gpu:1 your_job_script.sh

If your job could run on either the “gpu_mig40” or “gpu” partition, you can specify both, and the scheduler will schedule your job on either partition.

sbatch –partition=gpu_mig40,gpu –gres=gpu:1 your_job_script.sh

For questions or support requests, please contact our team at arc-support@umich.edu

Technology supports researchers’ quest to understand parental discipline behaviors

By | Feature, HPC, News, Research, Systems and Services, Uncategorized

Image by Rajesh Balouria from Pixabay

How do different types of parental discipline behaviors affect children’s development in low- and middle-income countries (LMICs)? A group of researchers set out to understand that question. They used a large data set from UNICEF of several hundred thousand families. The data came from the fourth (2009–2013) and fifth (2012–2017) rounds of the UNICEF Multiple Indicator Cluster Surveys. 

“The majority of parenting research is conducted in higher income and Westernized settings. We need more research that shows what types of parenting behaviors are most effective at promoting children’s development in lower resourced settings outside of the United States. I wanted to conduct an analysis that provided helpful direction for families and policymakers in LMICs regarding what parents can do to raise healthy, happy children,” said Kaitlin Paxton Ward, People Analytics Researcher at Google and Research Affiliate at the University of Michigan.

Dr. Paxton Ward is the lead author on the recently-released paper, “Associations between 11 parental discipline behaviors and child outcomes across 60 countries.” Other authors are also cited in the article: Andrew Grogan-Kaylor, Julie Ma, Garrett T. Pace, and Shawna Lee.

Together, they tested associations between 11 parental discipline behaviors and outcomes (aggression, distraction, and prosocial peer relations) of children under five years in 60 LMICs:

  • Verbal reasoning (i.e., explaining why the misbehavior was wrong)
  • Shouting
  • Name calling
  • Shaking
  • Spanking
  • Hitting/slapping the body
  • Hitting with an object 
  • Beating as hard as one could
  • Removing privileges 
  • Explaining
  • Giving the child something else to do

Results

Verbal reasoning and shouting were the most common parental discipline behaviors towards young children. Psychological and physical aggression were associated with higher child aggression and distraction. Verbal reasoning was associated with lower odds of aggression, and higher odds of prosocial peer relations. Taking away privileges was associated with higher odds of distraction, and lower odds of prosocial peer relations. Giving the child something else to do was associated with higher odds of distraction. The results indicated that there was some country-level variation in the associations between parenting behaviors and child socioemotional outcomes, but also that no form of psychological or physical aggression benefitted children in any country.

Conclusion 

Parental use of psychological and physical aggression were disadvantageous for children’s socioemotional development across countries. Only verbal reasoning was associated with positive child socioemotional development. The authors suggest that greater emphasis should be dedicated to reducing parental use of psychological and physical aggression across cultural contexts, and increasing parental use of verbal reasoning.

The technology used to analyze the data

The researchers relied on a complicated Bayesian multilevel model. This type of analysis incorporated knowledge from previous studies to inform the current analysis, and also provided a way for the researchers to look in more detail at variation across countries. To accomplish this task, the team turned to ITS Advanced Research Computing (ARC) and the Great Lakes High-Performance Computing Cluster. Great Lakes is the largest and fastest HPC service on U-M’s campus. 

“I know for me as a parent of young children, you want the best outcome. I have known people to grow up with different forms of discipline and what the negative or positive influence of those are,” said Brock Palen, ARC director. 

The researchers also created a visual interpretation of their paper for public outreach using a web app called ArcGIS StoryMaps. This software helps researchers tell the story of their work. With no coding required, StoryMaps combine images, text, audio, video, and interactive maps in a captivating web experience. StoryMaps can be shared with groups of users, with an organization, or with the world. 

All students, faculty, and staff have access to ArcGIS StoryMaps. Since 2014, U-M folks have authored over 7,500 StoryMaps, and the number produced annually continues to increase year-over-year. Explore examples of how people around the world are using this technology in the StoryMaps Gallery.

“This intuitive software empowers the U-M community to author engaging, multimedia, place-based narratives, without involving IT staff,” said Peter Knoop, research consultant with LSA Technology Services. 

Correspondence to Dr. Kaitlin Paxton Ward, kpward@umich.edu.

Related article

Secure Enclave Service rate approved, shortcode needed by July 25 

By | Data, News, Systems and Services, Uncategorized

The Yottabyte Research Cloud (YBRC) migrated to the Secure Enclave Services (SES) in 2022. The new system provides improved performance for researcher workloads. Due to this transition, ARC began billing researchers who consume more than 16 gigabytes (GB) of RAM (memory) per month on July 1, 2023. 

The first 16 GB of RAM (memory) is covered by the U-M Research Computing Package (UMRCP). If you have not already requested or been granted the UMRCP, learn more and request it on the UMRCP service page on the ARC website.

Approved rate 

The approved rate for a Secure Enclave Services machine is $7.00 per GB of RAM (memory) per machine, per month. Visit the Rates page on the ARC website for information about billing for all ARC services. 

Action requested: Submit a shortcode 

A shortcode is needed to accommodate billing for any resources consumed that are not covered by the UMRCP. Please submit a shortcode no later than July 25, 2023. Access to your machine will be removed or reduced if a shortcode is not on file by July 25. Contact us at arc-support@umich.edu to submit your shortcode, or make any changes to the configuration or use of your machines. 

Some schools and colleges (including the U-M Medical School) are subsidizing the use of Secure Enclave Services beyond the 16 GB of RAM (memory). Talk to your unit’s IT staff or email ARC to learn more. 

Contact ARC (arc-support@umich.edu) if you would like to meet with the ARC storage manager to ask questions or get clarification.

U-M Research Computing Package automatic renewal begins July 1

By | News, Research, Uncategorized

** Looking for the LARCC Application?

____

The no-cost bundle of supercomputing resources known as the U-M Research Computing Package (UMRCP) automatically renews for most on July 1. 

Provided by Information and Technology Services, the UMRCP offers qualified researchers on all campuses (Ann Arbor, Dearborn, Flint, and Michigan Medicine) with allocations of high-performance computing, secure enclave, and research storage services. (Many units, including Michigan Medicine, provide additional resources to researchers. Be sure to check with your school or college.) 

If a faculty researcher has left the university (or is about to), and their research remains at the university, an alternative administrator must be assigned via the ARC Resource management Portal (RMP) so that the allocations can continue uninterrupted. ARC is available to help researchers make this transition. 

Don’t have the UMRCP? Here’s how to request resources 

Faculty, as well as staff and PhD students with their own funded research on all campuses (Ann Arbor, Dearborn, Flint, and Michigan Medicine), are welcome to request allocations. Full details are available on the Advanced Research Computing website

PhD researchers who do not have their own funded research can work with their advisor to be added to their allocations via the ARC Resource Management Portal (RMP).

“The UMRCP was launched in 2021 to meet the needs of a diversity of disciplines and to provide options for long-term data management, sharing, and protecting sensitive data,” said Brock Palen, director, ITS Advanced Research Computing. “The UMRCP alleviates a lot of the pressure that researchers feel in terms of managing the technology they need to achieve breakthroughs.”

More information

Globus can now be used with Armis2 

By | Armis2, HPC, News, Uncategorized

Researchers who have an Armis2 High-Performance Computing account can now move data to and from other Protected Health Information (PHI)-approved systems using Globus File Transfer. (The endpoint is umich#armis2.) 

To learn more about your responsibility and approved services, visit the Sensitive Data Guide and the Protected Health Information (PHI) webpage on the Safe Computing Website. Send an email to ARC at arc-support@umich.edu to get started using Globus with PHI on your own system (this is not needed for researchers using ARC services including Armis2, and Data Den and Turbo with Sensitive Data).

“With the addition of Globus on Armis2, researchers using ITS ARC services can use the same Globus tools and processes to securely and reliably move their data on all ARC systems and across the university and beyond,” said Matt Britt, ARC HPC systems manager.

Globus allows the transfer and collaborative access of data between different storage systems, lab computers, and personal desktops and laptops. Globus enables researchers to use a web browser to submit transfer and data synchronization requests between destinations. 

As a robust, cloud-based, file transfer service, Globus is designed to securely move your data, ranging from megabytes to petabytes. ARC is a Globus Subscription Provider for the U-M community, which allows U-M resources to serve as endpoints or collections for file transfers.

“There are many interesting research collaborations happening at U-M, as well as nationally and internationally. Globus can facilitate all of those interactions securely,” said Brock Palen, ARC director. “Globus is the go-to tool we recommend for data transfer.”

Learn more 

How can we help you?

For assistance or questions, contact ARC at arc-support@umich.edu.

Data Den now supports sensitive data

By | News, Uncategorized

Data Den Research Archive is a service for preserving electronic data generated from research activities. It is a low-cost, highly durable storage system and is the largest storage system operated by ARC. Storing of sensitive data (including HIPAA, PII, and FERPA) is now supported (visit the Sensitive Data Guide for full details). This service is part of the U-M Research Computing Package (UMRCP) that provides storage allocations to researchers. Most researchers will not have to pay for Data Den. 

A disk-caching, tape-backed archive, this storage service is best for data that researchers do not need regularly, but still need to keep because of grant requirements. 

“Data Den is a good place to keep research data past the life of the grant,” said Jeremy Hallum, ARC research computing manager. “ARC can store data that researchers need to keep for five to ten years.” 

Hallum goes on to say that Data Den is only available in a replicated format. “Volumes of data are duplicated between servers or clusters for disaster recovery so research data is very safe.”

Data Den can be part of a well-organized data management plan providing international data sharing, encryption, and data durability. Jerome Kinlaw, ARC research storage lead, said that the Globus File Transfer service works well for data management. “Globus is easy to use for moving data in and out of Data Den.”

The ITS U-M Research Computing Package (UMRCP) provides 100 terabytes (TB) of Data Den storage to qualified researchers. This 100 TB can be divided between restricted and non-restricted variants of Data Den for use as needed. (The ITS Data Storage Finder can help researchers find the right storage solutions to meet their needs.)

“I’m pleased that Data Den now offers options for sensitive data, and that researchers can take advantage of the UMRCP allocations,” said Brock Palen, ARC director. “We want to lighten the load so that researchers can do what they do best, and our services are now more cost effective than ever.”

Globus maintenance happening at 9 a.m. on March 11

By | Armis2, Data, General Interest, Great Lakes, HPC, News, Research, Uncategorized

Due to planned maintenance by the vendor, Globus services will be unavailable for up to two hours beginning at 9 a.m. U.S. Eastern Time (10 a.m. Central Time) on Saturday, March 11, 2023.

Customers will not be able to authenticate or initiate any transfers during that time. Any transfers that have started before the outage will be stalled until the outage is over. Transfers will resume once maintenance is complete.

More details are available on the Globus blog.

For assistance or questions, please contact ARC at arc-support@umich.edu.

Protein structure prediction team achieved top rankings

By | Great Lakes, News, Uncategorized

CASP15 is a bi-annual competition assessment of methods of protein structure modeling. Independent assessors then compared the models with experiments, and the results and their implications were discussed at the CASP15 Conference, held December 2022, in Turkey.

A joint team with members from the labs of Dr. Peter Freddolino and Dr. Yang Zhang took first place in the Multimer and Interdomain Prediction categories, and was again the top-ranked server in the Regular (domains) category according to the CASP assessor’s criteria.

These wins are well-earned. Freddolino noted, “This is a highly competitive event, against some of the very best minds and powerful companies in the world.”

The Zhang/Freddolino team competed against nearly 100 other groups which include other academic institutions, as well as major cloud and commercial companies. Groups from around the world submitted more than 53,000 models on 127 modeling targets in 5 prediction categories. 

“Wei’s predictions did amazingly well in CASP15!,” said Freddolino. Wei Zheng, Ph.D., is a lab member and a research fellow with the Department of Computational Medicine and Bioinformatics (DCMB). 

Zheng said that the team participates in the regular protein structure prediction and protein complex structure prediction categories. “The results are assessed as regular protein domain modeling, regular protein inter-domain modeling, and protein complex modeling. In all categories, our models performed very well!” 

The technology that supported this impressive work 

The resources to achieve these results were grant-funded, which allowed the team to leverage a number of university resources, including:  

  • The Lighthouse High-Performance Computing Cluster (HPC) service. Lighthouse is managed by the Advanced Research Computing (ARC) team, and ARC is a division of Information and Technology Services (ITS). 
  • The algorithms were GPU-intensive and run on the Great Lakes HPC Cluster. Graphics processing units (GPUs) are specialized processors designed to accelerate graphics rendering. The Great Lakes cluster provided additional space for running compute cycles. Kenneth Weiss, IT project manager senior with DCMB and HITS, said that many of the algorithms used by Zheng benefited from the increased performance of being able to compute the data on a GPU.
  • Multiple storage systems, including Turbo Research Storage. High-speed storage was crucial for storing AI-trained models and sequence libraries used by the methods developed by Zhang, Freddolino, and Zheng called D-I-TASSER/DMFold-Multimer. 
  • Given the scale of the CASP targets, the grant-funded compute augmented capacity by utilizing the Great Lakes cluster, Freddolino and his team took advantage of the allocations provided by the ITS U-M Research Computing Package (UMRCP) and the HITS Michigan Medicine Research Computing Investment (MMRCI) programs which defrayed the cost of computing substantially.
  • The collaboration tool Slack was used to keep Freddolino and Zheng in close contact with ARC and the DCMB teams. This provided the ability to deal with issues promptly, avoiding delays that would have had a detrimental impact on meeting CASP targets.

Technology staff from ARC, DCMB, and Health Information and Technology Services (HITS) provided assistance to the research team. All of the teams helped with the mitigation of bottlenecks that affected speed and throughput that Zheng needed for results. Staff also located and helped leverage resources including those on Great Lakes, utilizing available partitions and queues on the clusters.

“Having the flexibility and capacity provided by Great Lakes was instrumental in meeting competition deadlines,” said Weiss.

DCMB staff and the HITS HPC Teams team took the lead on triaging software problems giving Freddolino’s group high priority.

ARC Director Brock Palen provided monitoring and guidance on real-time impact and utilization of resources. “It was an honor to support this effort. It has always been ARC’s goal to take care of the technology so researchers can do what they do best. In this case, Freddelino and Zheng knocked it out of the park.” 

Jonathan Poisson, technical support manager with DCMB, was instrumental in helping to select and configure the equipment purchased by the grant. “This assistance was crucial in meeting the tight CASP15 targets, as each target is accompanied by a deadline for results.” 

Read more on the Computational Medicine and Bioinformatics website and the Department of Biological Chemistry website.

Related presentation: D-I-TASSER: Integrating Deep Learning with Multi-MSAs and Threading Alignments for Protein Structure Prediction

The resources to achieve these results were provided by an NIH-funded grant (“High-Performance Computing Cluster for Biomedical Research,” SIG: S10OD026825).