Category

Data

Using GenAI to design floor plans and buildings

By | Data, Data sets, Educational, Feature, General Interest, HPC, News, Research, Systems and Services

There is a lot to consider when designing places where humans live and work. How will the space be used? Who’s using the space? What are budget considerations? It is painstaking and time consuming to develop all of those details into something usable. 

What if Generative AI (GenAI) could help? We already know that it can be used to create text, music, and images. Did you know that it can also create building designs and floor plans? 

Dr. Matias del Campo, associate professor of architecture in the Taubman College for Architecture and Urban Planning, has been working to make architectural generative models more robust. He aims to expand on the patterns, structures, and features from the available input data to create architectural works. Himself a registered architect, designer, and educator, del Campo conducts research on advanced design methods in architecture using artificial intelligence techniques.

He leverages something called neural networks for two projects: 

  • Common House: A project that focuses on floor plan analysis and generation.
  • Model Mine: A large-scale, 3D model housing database for architecture design using Graph Convolutional Neural Networks and 3D Generative Adversarial Networks.

This is an example from the annotated data created from the Common House research project. The main obstacle that has emerged in creating more real-life plans is the lack of databases that are tailored for these architecture applications. The Common House project aims at creating a large-scale dataset for plans with semantic information. Precisely, our data creation pipeline consists of annotating different components of a floor plan, for e.g., Dining Room, Kitchen, Bed Room, etc.

 

Four quadrants showing 9 models each of chairs, laptops, benches, and airplanes

A large scale 3D model housing database for Architecture design using Graph Convolutional Neural Networks and 3D Generative Adversarial Networks.

What exactly are neural networks? The name itself takes inspiration from the human brain and the way that biological neurons signal to one another. In the GenAI world, neural networks are a subset of machine learning and are at the heart of deep learning algorithms. This image of AI hierarchy may be helpful in understanding how they are connected.

Dr. del Campo’s research uses GenAI for every step of the design process including 2D models for things like floors and exteriors, and 3D models for shapes of the rooms, buildings, and volume of the room. The analysis informs design decisions. 

DEI considerations

del Campo notes that there are some DEI implications for the tools he’s developing. “One of the observations that brought us to develop the ‘Common House’ (Plangenerator) project is that the existing apartment and house plan datasets are heavily biased towards European and U.S. housing. They do not contain plans from other regions of the world; thus, most cultures are underrepresented.” 

To counterbalance that, del Campo and his team made a global data collection effort, collecting plans and having them labeled by local architects and architecture students. “This not only ensured a more diverse dataset but also increased the quality of the semantic information in the dataset.”

How technology supports del Campo’s work

A number of services from Information Technology & Services are used in these projects, including: Google at U-M collaboration tools, GenAI, Amazon Web Services at U-M (AWS), and GitHub at U-M

Also from ITS, the Advanced Research Computing (ARC) team provides support to del Campo’s work. 

“We requested allocations from the U-M Research Computing Package for high-performance computing (HPC) services in order to train two models. One focuses on the ‘Common House’ plan generator, and the other focuses on the ‘Model Mine’ dataset to create 3D models based,” said del Campo. 

Additionally, they used HPC allocations from the UMRPC in the creation of a large-scale artwork called MOSAIK which consists of over 20,000 AI-generated images, organized in a color gradient. 

A large scale 3D model housing database for Architecture design using Graph Convolutional Neural Networks and 3D Generative Adversarial Networks.

“We used HPC to run the algorithm that organized the images. Due to the necessary high resolution of the image, this was only possible using HPC.”

“Dr. del Campo’s work is really novel, and it is different from the type of research that is usually processed on Great Lakes. I am impressed by the creative ways Dr. del Campo is applying ITS resources in a way that we did not think was possible,” said Brock Palen, director of the ITS Advanced Research Computing. 

Related: Learn about The Architecture + Artificial Intelligence Laboratory (AR2IL)

Using natural language processing to improve everyday life

By | Data, Great Lakes, HPC, News, Research, Uncategorized

Joyce Y. Chai, professor of electrical engineering and computer science, College of Engineering, and colleagues have been seeking answers to complex questions using natural language processing and machine learning that may improve everyday life.

Some of the algorithms that they develop in their work are meant for tasks that machines may have little to no prior knowledge of. For example, to guide human users to gain a particular skill (e.g., building a special apparatus or even, “Tell me how to bake a cake”). A set of instructions based on the observation of what the user is doing, e.g., to correct mistakes or provide the next step, would be generated by Generative AI, or GenAI. The better the data and engineering behind the AI, the more useful the instructions will be. 

“To enable machines to quickly learn and adapt to a new task, developers may give a few examples of recipe steps with both language instructions and video demonstrations. Machines can then (hopefully) guide users through the task by recognizing the right steps and generating relevant instructions using GenAI,” said Chai.

What are AI, machine learning, deep learning, and natural language processing?

It might help to take a step back to understand AI, machine learning (ML), and deep learning at a high level. Both ML and deep learning are subsets of AI, as seen in the figure. Some natural language processing (NLP) tasks fall within the realm of deep learning. They all work together and build off of each other.

Artificial Intelligence, or AI, is a branch of computer science that attempts to simulate human intelligence with computers. It involves creating systems to perform tasks that usually need human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.

“NLP is a sub-area in AI, where DL/ML approaches are predominantly applied,” stated Chai.

Christopher Brown, a research data scientist, with ITS Advanced Research Computing (ARC) and a member of the ARC consulting team, explains that ML is a subfield of AI. Within ML, algorithms are used to generalize situations beyond those seen in training data and then complete tasks without further guidance from people. A good example is U-M GPT. The large language models (LLMs) accessible via U-M GPT are trained with millions of diverse examples. “The goal is to train the models to reliably predict, translate, or generate something.” 

“Data is any information that can be formatted and fed into an algorithm that can be used for some task, including journal articles, chats, numbers, videos, audio, and texts,” said Brown, Algorithms can be trained to perform tasks using these real-world data. 

Natural Language Processing is a branch of artificial intelligence that helps computers understand and generate human language in a way that is both meaningful and useful to humans. NLP teaches computers to understand languages and then respond so that humans can understand, and even accounting for when rich context language is used. 

“NLP is highly interdisciplinary, and involves multiple fields, such as computer science, linguistics, philosophy, cognitive science, statistics, mathematics, etc.,” said Chai.

Examples of NLP are everywhere: when you ask Siri for directions, or when Google efficiently completes your half-typed query, or even when you get suggested replies in your email. 

Ultimately NLP, along with AI, can be used to make interactions between humans and machines as natural and as easy as possible. 

A lot of data is needed to train the models

Dr. Chai and her team use large language models, a lot of data, and computing resources. These models take longer to train and are harder to interpret. Brown says, “The state of the art, groundbreaking work tends to be in this area.” 

Dr. Chai uses deep learning algorithms that make predictions about what the next part of the task or conversation is. “For example, they use deep learning and the transformer architecture to enable embodied agents to learn how new words are connected to the physical environment, to follow human language instructions, and to collaborate with humans to come up with a shared plan,” Brown explains.

The technology that supports this work

To accomplish her work, Dr. Chai uses the Great Lakes High-Performance Computing Cluster and Turbo Research Storage, both of which are managed by U-M’s Advanced Research Computing Group (ARC) in Information and Technology Services. She has 16 GPUs on Great Lakes at the ready, with the option to use more at any given time. 

A GPU, or Graphics Processing Unit, is a piece of computer equipment that is good at displaying pictures, animations, and videos on your screen. The GPU is especially adept at quickly creating and manipulating images. Traditionally, GPUs were used for video games and professional design software where detailed graphics were necessary. But more recently, researchers including Dr. Chai discovered that GPUs are also good at handling many simple tasks at the same time. This includes tasks like scientific simulations and AI training where a lot of calculations need to be done in parallel (which is perfect for training large language models).

“GPUs are popular for deep learning, and we will continue to get more and better GPUs in the future. There is a demand, and we will continue supporting this technology so that deep learning can continue to grow,” said Brock Palen, ITS Advanced Research Computing director. 

Chai and her team also leveraged 29 terabytes of the Turbo Research Storage service at ARC. NLP benefits from the high-capacity, reliable, secure, and fast storage solution. Turbo enables investigators across the university to store and access data needed for their research via Great Lakes. 

Great Lakes HPC in the classroom 

ARC offers classroom use of high-performance computing cluster resources on the Great Lakes High-Performance Computing Cluster

Dr. Chai regularly leverages this resource. “Over 300 students have benefited from this experience. We have homework that requires the use of the Great Lakes, e.g., having students learn how to conduct experiments in a managed job-scheduling system like SLURM. This will benefit them in the future if they engage in any compute-intensive R&D (research and development).

“For my NLP class, I request Great Lakes access for my students so they have the ability to develop some meaningful final projects. We also use the Great Lakes HPC resources to study the reproducibility for NLP beginners,” said Chai. A gallery is available for many of the student projects.

The UMRCP defrays costs

The U-M Research Computing Package is a set of cost-sharing allocations offered by ITS ARC, and they are available upon request. Other units offer additional cost-sharing to researchers. Chai said, “We typically use the nodes owned by my group for research projects that require intensive, large-scale GPU model training. We use the UMRCP for less intensive tasks, thereby extending the budgetary impact of the allocations.” 

HPC Emergency 2023 Maintenance: September 15

By | Data, HPC, News, Research, Systems and Services

Due to a critical issue which requires an immediate update, we will be performing updates to Slurm and underlying libraries which allow parallel jobs to communicate. We will be updating the login nodes and the rest of the cluster on the fly and you should only experience minimal impact when interacting with the clusters. 

  • Jobs that are currently running will be allowed to finish. 
  • All new jobs will only be allowed to run on nodes which have been updated. 
  • The login and Open OnDemand nodes will also be updated, which will require a brief interruption in service.

Queued jobs and maintenance reminders

Jobs will remain queued, and will automatically begin after the maintenance is completed. Any parallel using MPI will fail; those jobs may need to be recompiled, as described below. Jobs not using MPI will not be affected by this update.

Jobs will be initially slow to start, as compute nodes are drained of running jobs so they can be updated. We apologize for this inconvenience, and want to assure you that we would not be performing this maintenance during a semester unless it was absolutely necessary.

Software updates

Only one version of OpenMPI (version 4.1.6) will be available; all other versions will be removed. Modules for the versions of OpenMPI that were removed will warn you that it is not available, as well as prompt you to load openmpi/4.1.6. 

When you use the following command, it will default to openmpi/4.1.6:
module load openmpi 

Any software packages you use (provided by ARC/LSA/COE/UMMS or yourself) will need to be updated to use openmpi/4.1.6. The software package updates will be completed by ARC. The code you compile yourself will need to be updated by you.

Note that at the moment openmpi/3.1.6 will be discontinued and warned to update your use to openmpi/4.1.6.

Status updates

 

System software changes

Great Lakes, Armis2 and Lighthouse

NEW version in BOLD OLD version
Slurm 23.02.5 compiles with:
  • PMIx
    • /opt/pmix/3.2.5
    • /opt/pmix/4.2.6
  • hwloc 2.2.0-3 (OS provided)
  • ucx-1.15.0-1.59056 (OFED provided)
  • slurm-libpmi
  • slurm-contribs
Slurm 23.02.3 compiles with:
  • PMIx
    • /opt/pmix/2.2.5
    • /opt/pmix/3.2.3
    • /opt/pmix/4.2.3
  • hwloc 2.2.0-3 (OS provided)
  • ucx-1.15.0-1.59056 (OFED provided)
  • slurm-libpmi
  • slurm-contribs
PMIx LD config /opt/pmix/3.2.5/lib PMIx LD config /opt/pmix/2.2.5/lib
PMIx versions available in /opt :
    • 3.2.5
    • 4.2.6
PMIx versions available in /opt :
  • 2.2.5
  • 3.2.3
  • 4.1.2
OpenMPI
  • 4.1.6
OpenMPI
  • 3.1.6
  • others

 

How can we help you?

For assistance or questions, contact ARC at arc-support@umich.edu.

Summer 2023 Network Maintenance: HPC and storage unavailable August 21-22 

By | Data, HPC, News, Research, Systems and Services

During the 2023 summer maintenance, a significant networking software bug was discovered and ARC was unable to complete the ARC HPC and Storage network updates at the MACC Data Center.

ITS has been working with the vendor on a remediation, and it will be implemented on August 21-22.  This will require scheduled maintenance for the HPC clusters Great Lakes, Armis2, and Lighthouse, as well as the ARC storage systems Turbo, Locker, and Data Den. The date was selected to minimize any impact during the fall semester. 

Maintenance dates:

HPC clusters and storage systems (/home and /scratch) and ARC storage systems (Turbo, Locker, and Data Den) will be unavailable August 21 starting at 7:00am.  Expected completion date is August 22nd.

Queued jobs and maintenance reminders

Jobs will remain queued, and will automatically begin after the maintenance is completed. The command “maxwalltime” will show the amount of time remaining until maintenance begins for each cluster, so you can size your jobs appropriately. The countdown to maintenance will also appear on the ARC homepage

Status updates

How can we help you?

For assistance or questions, contact ARC at arc-support@umich.edu.

Secure Enclave Service rate approved, shortcode needed by July 25 

By | Data, News, Systems and Services, Uncategorized

The Yottabyte Research Cloud (YBRC) migrated to the Secure Enclave Services (SES) in 2022. The new system provides improved performance for researcher workloads. Due to this transition, ARC began billing researchers who consume more than 16 gigabytes (GB) of RAM (memory) per month on July 1, 2023. 

The first 16 GB of RAM (memory) is covered by the U-M Research Computing Package (UMRCP). If you have not already requested or been granted the UMRCP, learn more and request it on the UMRCP service page on the ARC website.

Approved rate 

The approved rate for a Secure Enclave Services machine is $7.00 per GB of RAM (memory) per machine, per month. Visit the Rates page on the ARC website for information about billing for all ARC services. 

Action requested: Submit a shortcode 

A shortcode is needed to accommodate billing for any resources consumed that are not covered by the UMRCP. Please submit a shortcode no later than July 25, 2023. Access to your machine will be removed or reduced if a shortcode is not on file by July 25. Contact us at arc-support@umich.edu to submit your shortcode, or make any changes to the configuration or use of your machines. 

Some schools and colleges (including the U-M Medical School) are subsidizing the use of Secure Enclave Services beyond the 16 GB of RAM (memory). Talk to your unit’s IT staff or email ARC to learn more. 

Contact ARC (arc-support@umich.edu) if you would like to meet with the ARC storage manager to ask questions or get clarification.

Globus maintenance happening at 9 a.m. on March 11

By | Armis2, Data, General Interest, Great Lakes, HPC, News, Research, Uncategorized

Due to planned maintenance by the vendor, Globus services will be unavailable for up to two hours beginning at 9 a.m. U.S. Eastern Time (10 a.m. Central Time) on Saturday, March 11, 2023.

Customers will not be able to authenticate or initiate any transfers during that time. Any transfers that have started before the outage will be stalled until the outage is over. Transfers will resume once maintenance is complete.

More details are available on the Globus blog.

For assistance or questions, please contact ARC at arc-support@umich.edu.

Understanding the strongest electromagnetic fields in the universe

By | Data, Great Lakes, HPC, Research, Uncategorized

Alec Thomas is part of the team from the U-M College of Engineering Gérard Mourou Center for Ultrafast Optical Science that is building the most powerful laser in the U.S.

Dubbed “ZEUS,” the laser will be 3-petawatts of power. That’s a ‘3’ with 15 zeros. All the power generated in the entire world is 10-terawatts, or 1000 times less than the ZEUS laser. 

The team’s goal is to use the laser to explore how matter behaves in the most extreme electric and magnetic fields in the universe, and also to generate new sources of radiation beams, which may lead to developments in medicine, materials science, and national security. 

A simulation of a plasma wake.

This simulation shows a plasma wake behind a laser pulse. The plasma behaves like water waves generated behind a boat. In this image, the “waves” are extremely hot plasma matter, and the “boat” is a short burst of powerful laser light. (Image courtesy of Daniel Seipt.)

“In the strong electric fields of a petawatt laser, matter becomes ripped apart into a `plasma,’ which is what the sun is made of. This work involves very complex and nonlinear physical interactions between matter particles and light. We create six-dimensional models of particles to simulate how they might behave in a plasma in the presence of these laser fields to learn how to harness it for new technologies. This requires a lot of compute power,” Thomas said. 

That compute power comes from the Great Lakes HPC cluster, the university’s fastest high-performance computing cluster. The team created equations to solve a field of motion for each six-dimensional particle. The equations run on Great Lakes and help Thomas and his team to learn how the particle might behave within a cell. Once the field of motion is understood, solutions can be developed. 

“On the computing side, this is a very complex physical interaction. Great Lakes is designed to handle this type of work,” said Brock Palen, director of Advanced Research Computing, a division of Information and Technology Services. 

Thomas has signed up for allocations on the Great Lakes HPC cluster and Data Den storage. “I just signed up for the no-cost allocations offered by the U-M Research Computing Package. I am planning to use those allocations to explore ideas and concepts in preparation for submitting grant proposals.”

Learn more and sign up for the no-cost U-M Research Computing Package (UMRCP).

Prof. Thomas’ work is funded by a grant from the National Science Foundation.

Yottabyte Research Cloud certified for CUI data

By | Data, General Interest, Happenings, News

Advanced Research Computing – Technology Services (ARC-TS) is pleased to announce that the Yottabyte Research Cloud (YBRC) computing platform is now certified to accept data designated as Controlled Unclassified Information (CUI). This includes certification for YBRC and its associated services, enabling secure data analysis on Windows and Linux virtual desktops as well as secure hosting of databases and data ingestion.

For more information on CUI, see the U-M Research Ethics and Compliance CUI webpage and Sensitive Data Guide: Controlled Unclassified Information (CUI). CUI regulations apply to federal non-classified information requiring security controls; an example of CUI data often used in research is data from the Centers for Medicare and Medicaid Services.

The new capability ensures the security of CUI data through the creation of firewalled network enclaves, allowing CUI data to be analyzed safely and securely in YBRC’s flexible, robust and scalable environment.  Within each network enclave, researchers have access to Windows and Linux virtual desktops that can contain any software required for their analysis pipeline.

This capability also extends to our database and ingestion services:

  • Structured databases:  MySQL/MariaDB, and PostgreSQL.
  • Unstructured databases: Cassandra, InfluxDB, Grafana, and ElasticSearch.
  • Data ingestion: Redis, Kafka, RabbitMQ.
  • Data processing: Apache Flink, Apache Storm, Node.js and Apache NiFi.
  • Other data services are available upon request.

The CUI certification extends YBRC’s existing capabilities for handling sensitive data; the service can also take HIPAA data, Export Controlled REsearch (ITAR, EAR), Personally Identifiable Information, and more. Please see Sensitive Data Guide: Yottabyte Research Cloud for more information.

YBRC is supported by U-M’s Data Science Initiative launched in 2015 and was created through a partnership between Yottabyte and ARC-TS. These tools are offered to all researchers at the University of Michigan free of charge, provided that certain usage limits are not exceeded. Large-scale users who outgrow the no-cost allotment may purchase additional YBRC resources. All interested parties should contact arcts-support@umich.edu.

MIDAS Data Science for Music Challenge Initiative announces funded projects

By | Data, General Interest, Happenings, News, Research

From digital analysis of Bach sonatas to mining data from crowdsourced compositions, researchers at the University of Michigan are using modern big data techniques to transform how we understand, create and interact with music.

Four U-M research teams will receive support for projects that apply data science tools like machine learning and data mining to the study of music theory, performance, social media-based music making, and the connection between words and music. The funding is provided under the Data Science for Music Challenge Initiative through the Michigan Institute for Data Science (MIDAS).

“MIDAS is excited to catalyze innovative, interdisciplinary research at the intersection of data science and music,” said Alfred Hero, co-director of MIDAS and the John H. Holland Distinguished University Professor of Electrical Engineering and Computer Science. “The four proposals selected will apply and demonstrate some of the most powerful state-of-the-art machine learning and data mining methods to empirical music theory, automated musical accompaniment of text and data-driven analysis of music performance.”

Jason Corey, associate dean for graduate studies and research at the School of Music, Theatre & Dance, added: “These new collaborations between our music faculty and engineers, mathematicians and computer scientists will help broaden and deepen our understanding of the complexities of music composition and performance.”

The four projects represent the beginning of MIDAS’ support for the emerging Data Science for Music research. The long-term goal is to build a critical mass of interdisciplinary researchers for sustained development of this research area, which demonstrates the power of data science to transform traditional research disciplines.

Each project will receive $75,000 over a year. The projects are:

Understanding and Mining Patterns of Audience Engagement and Creative Collaboration in Large-Scale Crowdsourced Music Performances

Investigators: Danai Koutra and Walter Lasecki, both assistant professors of computer science and engineering

Summary: The project will develop a platform for crowdsourced music making and performance, and use data mining techniques to discover patterns in audience engagement and participation. The results can be applied to other interactive settings as well, including developing new educational tools.

Understanding How the Brain Processes Music Through the Bach Trio Sonatas
Investigators: Daniel Forger, professor of mathematics and computational medicine and bioinformatics; James Kibbie, professor and chair of organ and university organist

Summary: The project will develop and analyze a library of digitized performances of Bach’s Trio Sonatas, applying novel algorithms to study the music structure from a data science perspective. The team’s analysis will compare different performances to determine features that make performances artistic, as well as the common mistakes performers make. Findings will be integrated into courses both on organ performance and on data science.

The Sound of Text
Investigators: Rada Mihalcea, professor of electrical engineering and computer science; Anıl Çamcı, assistant professor of performing arts technology

Summary: The project will develop a data science framework that will connect language and music, developing tools that can produce musical interpretations of texts based on content and emotion. The resulting tool will be able to translate any text—poetry, prose, or even research papers—into music.

A Computational Study of Patterned Melodic Structures Across Musical Cultures
Investigators: Somangshu Mukherji, assistant professor of music theory; Xuanlong Nguyen, associate professor of statistics

Summary: This project will combine music theory and computational analysis to compare the melodies of music across six cultures—including Indian and Irish songs, as well as Bach and Mozart—to identify commonalities in how music is structured cross-culturally.

The Data Science for Music program is the fifth challenge initiative funded by MIDAS to promote innovation in data science and cross-disciplinary collaboration, while building on existing expertise of U-M researchers. The other four are focused on transportation, health sciences, social sciences and learning analytics.

Hero said the confluence of music and data science was a natural extension.

“The University of Michigan’s combined strengths in data science methodology and music makes us an ideal crucible for discovery and innovation at this intersection,” he said.

Contact: Dan Meisler, Communications Manager, Advanced Research Computing
734-764-7414, dmeisler@umich.edu

Interdisciplinary Committee on Organizational Studies (ICOS) Big Data Summer Camp, May 14-18

By | Data, Educational, General Interest, Happenings, News
Social and organizational life are increasingly conducted online through electronic media, from emails to Twitter feed to dating sites to GPS phone tracking. The traces these activities leave behind have acquired the (misleading) title of “big data.” Within a few years, a standard part of graduate training in the social sciences will include a hefty dose of “using of big data,” and we will all be utilizing terms like API and Python.
This year ICOS, MIDAS, and ARC are again offering a one-week “big data summer camp” for doctoral students interested in organizational research, with a combination of detailed examples from researchers; hands-on instruction in Python, SQL, and APIs; and group work to apply these ideas to organizational questions.  Enrollment is free, but students must commit to attending all day for each day of camp, and be willing to work in interdisciplinary groups.

The dates of the camp are all day May 14th-18th.

https://ttc.iss.lsa.umich.edu/ttc/sessions/interdisciplinary-committee-on-organizational-studies-icos-big-data-summer-camp-3/