Discover the essential factors to consider when selecting the right data repository for long-term storage. This building block provides you with insights on data archiving, licensing, and publishing, as well as options for popular repository alternatives to ensure long-term access and preservation of your research data.
Part of responsible research is making sure your research data are properly managed, both during your research as well as when your research is finished (and hopefully published). For replication purposes, the data needs to remain available for a minimum of 10 years. But how do you decide which data to keep, document, and where to archive it?
Research data archiving is about storing and preserving research data for the long term. When you archive your data, you make sure you can read and access the data later on. Archiving your data does not necessarily mean your data is available to others, you can limit access and license your data if you want. You should store your data safely, in a suitable file format, with adequate documentation.
Licensing Your Data
Attaching a usage license to your data tells others what they can or cannot do with the data. Creative Commons licenses are widely used, for example, a CC-BY license lets others distribute, re-use, mix, and build upon your work as long as they give you credit. Adding NC to the license limits the use to non-commercial use only. Another resource for selecting a license is choosealicense.com.
Choosing a Data Repository
A data repository is a digital archive collecting, preserving and displaying datasets, related documentation, and metadata.
There are more than 1500 data repositories available to archive your research data, how do you know which one to choose? Sometimes journals or funders recommend a repository, a discipline-specific repository might be commonly used, or maybe your university recommends a repository. For example, at Tilburg University, Dataverse is recommended and the Research Data Office provides support depositing your data if desired.
Some well known repositories, besides Dataverse, are:
Trustworthy repositories should meet the following minimum criteria1:
- Provision of Persistent and Unique Identifiers (PIDs)
- Allow data discovery and identification
- Enable searching, citing, and retrieval of data
- Provide support for data versioning
- Enable finding of data
- Enable referencing to related relevant information, such as other data and publications
- Provide information that is publicly available and maintained, even for non-published, protected, retracted, or deleted data
- Use metadata standards that are broadly accepted (by the scientific community)
- Ensure that metadata are machine-retrievable
- Data access and usage licences
- Enable access to data under well-specified conditions
- Ensure data authenticity and integrity
- Enable retrieval of data
- Provide information about licensing and permissions (in ideally machine-readable form)
- Ensure confidentiality and respect rights of data subjects and creators
- Ensure persistence of metadata and data
- Be transparent about mission, scope, preservation policies, and plans (including governance, financial sustainability, retention period, and continuity plan)
A well known certification for a trusted repository is for example CoreTrustSeal.
Publishing Your Research Data
If you want to make your data reusable for purposes beyond the one for which you collected them, you should publish your data.
Publishing your data is the act of publicly disclosing the research data you have collected, making them findable, accessible, interoperable and reusable (FAIR data).
Want to make your data FAIR? Check out this page to learn how to do so.
There are multiple reasons to publish your data:
Data publication may lead to increased visibility, reuse, and citation and therefore recognition of scholarly work.
Data archiving and publication has direct benefits for the research itself (more robust), for the discipline, and science in general by enabling new collaborations, new data uses, and establishing links to the next generation of researchers.
The openness of research data is at the heart of scientific ethics.
External drivers like research data management policies from research funders and publishers might require data archiving and publication. For instance:
Be sure to archive/publish only data you are allowed to archive/publish. Often archives do not allow sensitive data or non-anonymized data to be archived. And if you want to share your data, make sure (if relevant) the participants agreed to this when you collected the data (consent).
- Set up a Data Management Plan to make your work efficient, and create more value for your data, yourself and others, during and after your research.
- OpenAIRE Guide on how to find a trustworthy repository for your data
- Promote your research data
- Publish Open Access
- R3Data.org searchable database of data repositories
This document was created with information provided by the Consortium of European Social Science Data Archives (CESSDA) and Kars Wijnhoven, Research Data Office at Tilburg University.
CESSDA Training Team (2017 - 2020). CESSDA Data Management Expert Guide. Bergen, Norway: CESSDA ERIC. Retrieved from cessda.eu.