The Data Liberation Project is an initiative to identify, obtain, reformat, clean, document, publish, and disseminate government datasets of public interest.
The Data Liberation Project liberates data for you. The best way you can support us is to make use of that data. You can also see our pending records requests and subscribe to the MuckRock newsletter to be notified about newly released datasets and scheduled trainings.
The Data Liberation Project has many volunteering opportunities, suitable for all sorts of people, regardless of technical expertise or years of experience. Join our Slack channel to get started.
The MuckRock newsletter is the best way to stay informed about new records requests, newly liberated datasets, and new opportunities for collaboration.
If you’re a First Amendment–savvy lawyer who’d like to provide pro bono assistance with FOIA requests, appeals, and/or litigation, absolutely get in touch.
The Data Liberation Project welcomes the broader community of journalists, government watchdogs, and engaged citizens to help us identify databases of major interest. If you have a suggestion, please tell us on Slack.
The Data Liberation Project is seeking medium- and long-term philanthropic funding. This would allow us to devote more time, staff, and resources to liberating data for the public good. If you are a potential funder or represent one, it’d be great to hear from you.
Vast troves of government data are inaccessible to the people and communities who need them most. These datasets are inaccessible because they’ve never been made public, because they’re published in obscure formats, or because they’re published without the documentation necessary to properly interpret them.
The Data Liberation Project launched in September 2022.
The Data Liberation Project is based on the internet, but with a focus on the United States. If you’d like to bring the project’s model to other countries or to a specific US state, get in touch.
Identify: Through its own research, as well as through consultations with journalists, community groups, scholars, government-data experts, and others, the Data Liberation Project aims to identify a large number of datasets worth pursuing.
Obtain: The Data Liberation Project plans to use a wide range of methods to obtain the datasets, including via Freedom of Information Act requests, intervening in lawsuits, web-scraping, and advanced document parsing. To improve public knowledge about government data systems, the Data Liberation Project also files FOIA requests for essential metadata, such as database schemas, record layouts, data dictionaries, user guides, and glossaries.
Reformat: Many datasets are delivered to journalists and the public in difficult-to-use formats. Some may follow arcane conventions or require proprietary software to access, for instance. The Data Liberation Project will convert these datasets into open formats, and restructure them so that they can be more easily examined.
Clean: The Data Liberation Project will not alter the raw records it receives. But when the messiness of datasets inhibits their usefulness, the project will create secondary, “clean” versions of datasets that fix these problems.
Document: Datasets are meaningless without context, and practically useless without documentation. The Data Liberation Project will gather official documentation for each dataset into a central location. It will also fill observed gaps in the documentation through its own research, interviews, and analysis.
Disseminate: The Data Liberation Project will not expect reporters and other members of the public simply to stumble upon these datasets. Instead, it will reach out to the newsrooms and communities that stand to benefit most from the data. The project will host hands-on workshops, webinars, and other events to help others to understand and use the data.