Open Data Transition Report: An Action Plan for the Next Administration

Goal III: Share scientific research data to spur innovation and scientific discovery
Recommendation 18: Develop an Annual Research Data Census to increase awareness of federally funded research and improve access to data
First 100 Days
National Science Foundation
Action Plan:
  • • Within the administration’s first 100 days, the National Science Foundation (NSF) should develop and implement an Annual Research Data Census—key information about datasets, their characteristics, and stewardship environments developed by its grantees—in connection with NSF’s annual reporting requirements.
  • • The NSF should develop a data survey through funded research and pilot it with a sample of grantees. After the pilot, the NSF should require the survey as part of grantees’ annual report on research.gov.
  • • The NSF should publish the results of the survey as an Annual Research Data Census available as open data in machine-readable form.
  • • If the NSF Annual Research Data Census is demonstrated to be effective, the White House Office of Science and Technology Policy (OSTP) should mandate developing similar, interoperable data collection across all other science agencies.

In February 2013, the White House Office of Science and Technology Policy (OSTP) issued guidelines on increasing access to the data and results of federally funded research. Assistant to the President for Science and Technology and Director of OSTP John Holdren instructed federal agencies to ensure “public access” to federally funded research outputs by requiring federal grantees to make their research papers available for free within 12 months of initial publication. Additionally, to ensure the broader aim of “open science,” OSTP directed agencies to require their grantees to develop plans for sharing their research data in a timely manner, and in a reusable format.

As a major step in implementing this guidance, the National Science Foundation (NSF) should develop an Annual Research Data Census to provide public, searchable information on the data, its characteristics, and its stewardship environments.  This could be accomplished through a survey adding a small number of additional questions to grantees’ annual reporting requirements. Since the NSF already requires its grantees to report annually on their work through the FastLane system on research.gov, adding these survey questions will not substantially increase reporting burden.  

The survey can cover the amount and type of data generated; associated publications; basic metadata; whether the data is open, or will be made open; where the data is hosted and who can access it; and other factors. The Census would hold researchers accountable for sharing their data as much as possible while protecting privacy.

The Annual Research Data Census would allow the tracking of data sharing requirements and serve as a valuable resource to the scientific community. The Census would enable researchers and interested members of the public to search for data collections that may be difficult to find. Scientists would use the Census to identify others who have been collecting data through similar studies, to access the data if it is available, and to contact the investigator to request access to the data if necessary.

The Data Census would also inform the strategic development of data infrastructure needed to ensure that data is discoverable, accessible, usable, and sustainable for current and future innovation. Just as the Population Census informs the development of roads, bridges, hospitals and other elements of “societal infrastructure” that benefit the public, a Data Census will help identify the infrastructure needed to ensure that data can be fully used for future innovation and to advance work by the research community.

Finally, the Annual Research Data Census would make it possible to follow research trends over time with more insight and information. This information would be especially useful to NSF and other grant makers in helping them prioritize funding areas and look for possible gaps or synergies between different investigators and their lines of work. Census analyses could help show what public repositories and standards researchers use to house and manage their data, and areas that need improvement to make sharing of research data easier.

If the NSF Annual Research Data Census is demonstrated to be effective in enabling secondary research or other follow-on research using existing research data, OSTP should mandate that all other science agencies incorporate a similar census into their reviews of grantees. The government should maintain all data census information in a centralized location.

Additional Reading: