About this Project

This project aims to adapt the model of open-source software (OSS) distributions to address the technical limitations of today's data-sharing and develop all components of a "data distribution". The key concepts are: 1) Leverage - but do not replace - independent, existing, and future data hosting solutions to form a federated platform for data-sharing. 2) Employ software for data tracking and deployment logistics specialized for large data (git-annex) built atop Git, the most capable distributed version control system (dVCS) available today, to enable efficient data access at any level of granularity (from single files to entire collections of datasets). DataLad will provide access to data available from various sources (e.g. lab or consortium web-sites such as http://humanconnectome.org; data sharing portals such as http://openfmri.org and http://crcns.org) through a single interface. It will enable students and scientists to operate on data using familiar concepts, such as files and directories, while transparently managing data access and authorization with underlying hosting providers.

For questions and inquiries leave a comment, or email info@datalad.org.

Historical note

Originally project was named DataGit. It was later renamed into a more distinctive DataLad to avoid Git trademark infringement.


US-German collaboration in computational neuroscience (CRCNS) project "DataGit: converging catalogues, warehouses, and deployment logistics into a federated 'data distribution'" (Halchenko/Hanke), co-funded by the US National Science Foundation (NSF 1429999) and the German Federal Ministry of Education and Research (BMBF 01GQ1411).

NSF logo   BMBF logo

Hosted at

The Dartmouth College, Hanover, NH, USA.

Dartmouth logo

The Otto-von-Guericke-University, Magdeburg, Germany.

OvGU logo