Open Data Transition Report: An Action Plan for the Next Administration

Goal III: Share scientific research data to spur innovation and scientific discovery
Recommendation 19: Coordinate agency efforts to transition to cloud storage and analytics, enabling more research with government data
First 100 Days
General Services Administration
Action Plan:
  • • Within the first 100 days of the new administration, the General Services Administration (GSA) and 18F should coordinate across agencies to identify and prioritize open government research datasets that would benefit the research community through enhanced availability through data cloud storage.
  • • GSA should use bulk purchasing power to obtain cloud storage at discount prices and have 18F support other agencies to open up research data on the cloud through
  • • For large datasets, the government should pursue individual contracts for cloud storage and prioritize high-value research datasets.

The federal government has been slowly shifting data storage from agency-owned data centers to cloud-based services since 2009,  but many high-value research government datasets are not yet available on the cloud. Putting data on the cloud makes it more easily accessible for researchers and facilitates a variety of analyses. This is particularly useful for large datasets, which would previously require researchers to buy storage, download the dataset or request it via CDs, and then obtain enough processing power to analyze it—a process that could take weeks or months. With cloud storage, researchers can analyze data and develop new products without downloading and storing the data locally.

Although both the General Services Administration (GSA) and 18F provide services to support agencies moving data to the cloud, each agency determines its own path toward cloud-based services while complying with requirements of the Federal Risk and Authorization Management Program (FedRAMP) and National Institute of Standards and Technology guidance. The process has led to a fragmented demand for cloud resources, duplicative systems across and within agencies, and management challenges. Agencies reported that they planned to spend $2 billion on unique cloud computing systems in fiscal year 2016.  

By working across the government, GSA and 18F can be more strategic about obtaining cloud storage solutions at discount prices and prioritizing datasets for the cloud. GSA should leverage its purchasing power to bulk purchase cloud storage and 18F should use the platform to help distribute it across the government.

GSA and 18F should also prioritize efforts to open large datasets to support research. Examples of existing datasets that would directly support research initiatives include NEXRAD data, which contains weather radar data from the National Weather Service;  NAIP imagery, which is orthophotography that depicts agriculture growth patterns around the country;  and LIDAR data, a method for surveying that uses laser light to measure distance to a target, as well as many others. Making this data more easily accessible through the cloud, eliminates the lengthy logistical hurdles that currently exist for researchers to use these datasets.

By collectively assessing needs across the federal government and making bulk purchases for cloud storage and processing power, GSA can hasten efforts to move key datasets to the cloud while reducing costs. Cloud storage has already proved extremely valuable for research efforts involving key datasets. For example, the U.S. Geological Survey and NASA partnered with Amazon Web Services (AWS) to host Landsat data, spatial imagery and information on the Earth’s composition, in the cloud. The data is available to anyone for free with daily updates, often within hours of production.  The National Oceanic and Atmospheric Administration (NOAA) partnered with AWS, Google Cloud Platform, IBM, Microsoft, and the Open Cloud Consortium to put vast amounts of data in the cloud as well.  Similarly, the Broad Institute of MIT and Google collaborated to host Broad’s massive genomic dataset (Broad DNA sequencers produce more than 20 terabytes of data each day) on the cloud. After implementing optimizations, the Broad-Google collaboration reduced costs while improving processing time eight-fold.  These initiatives have demonstrated that the public-private partnership model is effective for cloud storage of large, high-value datasets.

Additional Reading: