MICDE to provide data analysis and dissemination support for $18 million tobacco research center

By | General Interest, Happenings, News, Research

The University of Michigan School of Public Health will house a new, multi-institutional center focusing on modeling and predicting the impact of tobacco regulation, funded with an $18 million federal grant from the National Institutes of Health and the Food and Drug Administration.

The Center for the Assessment of the Public Health Impact of Tobacco Regulations will be part of the NIH and FDA’s Tobacco Centers of Regulatory Science, the centerpiece of an ongoing partnership formed in 2013 to generate critical research that informs the regulation of tobacco products.

The Michigan Institute for Computational Discovery and Engineering (MICDE) will support the center’s Data Analysis and Dissemination core by collecting national and regional survey data, conducting analysis of the use of tobacco products including vaping and e-cigarettes, and disseminate the resulting tobacco modeling parameters to other research centers and the Food and Drug Administration.

The center is led by MICDE affiliated faculty member Rafael Meza, associate professor of Epidemiology, and David Levy, professor of Oncology at Georgetown University.

For more on the center, see the press release from the U-M School of Public Health: https://sph.umich.edu/news/2018posts/tcors-091718.html

Real estate dataset available to researchers

By | Data, Data sets, Educational, General Interest, Happenings, News

The University of Michigan Library system and the Data Acquisition for Data Sciences program (DADS) of the U-M Data Science Initiative (DSI) have recently joined forces to license a major data resource capturing parcel-level information about the property market in the United States.  

The data were licensed from the Corelogic corporation, who have assimilated deed, tax and foreclosure information on nearly all properties in the entire US. Coverage dates vary by county, some county records go back fifty years. Coverage is more comprehensive from the 1990s to the present.

These data will support a variety of research efforts into regional economies, economic disparities, trends in land-use, housing market dynamics, and urban ecology, among many other areas.

The data are available on the Turbo Research Storage system for users of the U-M High Performance Computing infrastructure, and via the University of Michigan Library.

To access the data, researchers must first sign a MOU; contact Senior Associate Librarian Catherine Morse cmorse@umich.edu for more information, or visit https://www.lib.umich.edu/database/corelogic-parcel-level-real-estate-data.

Flux HPC Blog: Querying data with SparkSQL

By | Data, General Interest, HPC, News

SparkSQL is a way for people to use SQL-like language to query their data with ease while taking advantage of the speed of Spark, a fast, general engine for data processing that runs over Hadoop. I wanted to test this out on a dataset I found from Walmart with their stores’ weekly sales numbers. I put the csv into our cluster’s HDFS (in /var/walmart) making it accessible to all Flux Hadoop users.

New private insurance claims dataset and analytic support now available to health care researchers

By | General Interest, Happenings, HPC, News | No Comments

The Institute for Healthcare Policy and Innovation (IHPI) is partnering with Advanced Research Computing (ARC) to bring two commercial claims datasets to campus researchers.

The OptumInsight and Truven Marketscan datasets contain nearly complete insurance claims and other health data on tens of millions of people representing the US private insurance population. Within each dataset, records can be linked longitudinally for over 5 years.  

To begin working with the data, researchers should submit a brief analysis plan for review by IHPI staff, who will create extracts or grant access to primary data as appropriate.

CSCAR consultants are available to provide guidance on computational and analytic methods for a variety of research aims, including use of Flux and other UM computing infrastructure for working with these large and complex repositories.

Contact Patrick Brady (pgbrady@umich.edu) at IHPI or James Henderson (jbhender@umich.edu) at CSCAR for more information.

The data acquisition and availability was funded by IHPI and the U-M Data Science Initiative.