Data Acquisition for Data Science (DADS) supports acquisition, preparation, management, and maintenance of specialized research data sets used in current and future data science-enabled research projects across U-M, with special focus on the four challenge initiative areas pursued by the Michigan Institute for Data Science (MIDAS): transportation science, health science, social science, and learning analytics.
DADS is funded through the Data Science Initiative (DSI); total funding is capped at $200,000 per year for 5 years.
DADS will be managed jointly by the Library and Advanced Research Computing (ARC), with support from ARC’s Consulting for Statistics, Computing, and Analytics Research (CSCAR), MIDAS, and ARC-Technology Services (ARC-TS) units.
Requests for DADS funding will be submitted through a web form available on Library, MIDAS, and CSCAR websites, and accepted on a rolling basis. Selection criteria and processes are detailed below.
DADS Selection Criteria
- Relevance/importance (merit, extent of user community, etc.)
- DADS requests will be reviewed on their scientific merit and potential for impacting data science-driven research across the U-M Ann Arbor campus. Priority will be given to requests that closely align with the aforementioned MIDAS initiative areas.
- Data sets acquired through DADS should have the potential to serve a wide segment of the U-M community. Highly specialized procurement requests that only serve individual researchers are discouraged.
- Costs (product/license and ingest/processing)
- DADS funds can be used to pay licensing and acquisition fees to publishers and commercial data providers, potentially including one-time or subscription-based costs.
- Data acquired through DADS can be processed into analyzable form by CSCAR or other U-M personnel; DADS funds can cover the costs for this data processing.
- Requests can be made for DADS funds to be used to cover transfer, processing, and storage costs for data that are otherwise free to obtain (e.g., to mirror open data repositories or to aggregate data obtained through an open API).
- Usability (ease of use, analytical tools, documentation, etc.)
- Data sets acquired through DADS should be made available to the U-M community through Turbo Research Storage or other campus storage options. Costs for use of these services can be covered through DADS.
- Priority will be given to data made available with appropriate documentation and metadata. [Note: If the raw data are subject to processing, the raw data will be retained and all scripts needed to generate the processed data will be made available along with the data. Metadata pertaining to the raw data, and documentation describing any data processing that was performed will be preserved and made available along with the processed and raw data.]
- Since the data are intended to be used by multiple researchers, there is a strong preference to use open and well-documented data format standards. If the data are provided in a proprietary or unusual format (e.g., SAS or MS-Access data files), CSCAR can be contracted to convert the data to an open format.
- Restrictions (embargo, number of users, exclusive use by single requestor)
- Priority will be given to data made openly available to U-M researchers, possibly within the constraints of dataset license and data use agreement.
- DADS funds should not be used for data management pertaining to new data produced at U-M, unless it is determined that the potential for wide use in a MIDAS challenge area is great.
- Restricted data, e.g., in which each user needs individual permission from the data provider to access the data, is eligible for this program provided that there is a clear process for additional users to obtain access.
To request funding
To request funding from DADS, fill out this form. Requesters will be asked to provide:
- Description of data set: domain, size, format, metadata and documentation, licensing and usage restrictions, raw or processed, required analysis tools, etc;
- Data source: vendor, publisher, foundation, government, web, research, etc;
- Intended use and community: requests must indicate the community of users that can be supported by the data resources while maintaining licensing, security, and other data restrictions as outlined in any applicable license or data use agreements;
- (if applicable) Data processing requirements;
- (if applicable) Hosting preference (Turbo, etc.);
- Estimated cost for acquisition and steps 4-5.
Requests will be accepted on a rolling basis. Questions can be directed to firstname.lastname@example.org. Unit-specific questions can also be sent to:
Library (Jen Green, email@example.com).
ARC-TS (Brock Palen, firstname.lastname@example.org)
Requests will be reviewed by a DADS committee comprised of Library and ARC personnel.