The purging of federal databases has inspired many efforts to save key data sets. Here is a catalog of those projects. Please message me or post links in the activity feed to any others you identify.
Here is a letter from the CDC Advisory Committee as the purging began.


A crowd-sourced repository for valuable government data

In recent months the Harvard Law School Library Innovation Lab has created a data vault to download, sign as authentic, and make available copies of public government data that is most valuable to researchers, scholars, civil society and the public at large across every field. To begin, we have collected major portions of the datasets tracked by data.gov, federal Github repositories, and PubMed.

This is the public repository for the End of Term Web Archive project. The End of Term Web Archive is a collaborative initiative that collects, preserves, and makes accessible United States Government websites at the end of presidential administrations. Beginning in 2008, the End of Term Web Archive has thus far preserved websites from administration changes in 2008, 2012, 2016, 2020, and is currently working to archive content from the 2024 electoral season.


ICPSR is an international consortium of more than 810 academic institutions and research organizations. ICPSR (Inter-university Consortium for Political and Social Research) provides leadership and training in data access, curation, and methods of analysis for the social science research community.
ICPSR maintains a data archive of more than 350,000 files of research in the social and behavioral sciences. It hosts 23 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism, and other fields.
Public Environmental Data Project
The Public Environmental Data Project is committed to preserving and providing public access to federal environmental data. We are a volunteer coalition of several environmental, justice, and policy organizations, researchers across several universities, archivists, and students who rely on federal datasets and tools to support critical research, advocacy, policy, and litigation work. To gather insights on what data to preserve, we reached out to our networks, which consist largely of environmental justice groups and networks, state and local government climate offices, and academic researchers. We compiled a large list of federal databases and tools, and prioritized them based on their relative impact, our confidence that we could archive them, and the relative effort it would take to obtain and archive them.
Advice on preserving websites from Naseem Miller.
- To find the missing websites, go to Wayback Machine and type in the website’s URL in the search bar.
- Save the websites to the Wayback Machine. The easiest way to do this is by installing the Wayback Machine extension for your browser. The add-ons and extensions are listed on the left-hand panel of the website’s homepage.
- If you’re concerned that certain websites or web pages may be removed, you can suggest federal websites and content that end in .gov, .mil and .com to the End of Term Web Archive.
- You can suggest federal climate and environmental databases to Environmental Data and Governance Initiative.
- You can suggest databases to The Data Liberation Project, which is run by MuckRock and Big Local News.
- Tell science journalist Maggie Koerth what CDC data you’ve downloaded and whether you’ve made them publicly available.

About the Internet Archive

The Internet Archive is the go to site to find deleted content.
The Internet Archive, a 501(c)(3) non-profit, is building a digital library of Internet sites and other cultural artifacts in digital form. Like a paper library, we provide free access to researchers, historians, scholars, people with print disabilities, and the general public. Our mission is to provide Universal Access to All Knowledge.
We began in 1996 by archiving the Internet itself, a medium that was just beginning to grow in use. Like newspapers, the content published on the web was ephemeral – but unlike newspapers, no one was saving it. Today we have 28+ years of web history accessible through the Wayback Machine and we work with 1,200+ library and other partners through our Archive-It program to identify important web pages.
A page devoted to CDC data sets can be found here.
Don’t forget that Canada still has a fully functioning democracy without ideological censorship.
