[archiving, data archiving, dataverse, zenodo, figshare, osf, dans]


Data Archiving

Part of responsible research is making sure your research data are properly managed, both during your research as well as when your research is finished (and hopefully published). For replication purposes, the data needs to remain available for a minimum of 10 years. But how do you decide which data to keep, document and where to archive it?

This tutorial on data management and this article on documenting your data provide some insights.

Research data archiving is about storing and preserving research data for the long term. When you archive your data, you make sure you can read and access the data later on. Archiving your data does not necessarily mean your data is available to others, you can limit access and license your data if you want. You should store your data safely, in a suitable file format, with adequate documentation.

Licensing Your Data

Attaching a usage license to your data tells others what they can or cannot do with the data. Creative Commons licenses are widely used, for example a CC-BY license lets other distribute, re-use, mix, build upon your work as long as they give you credit. Adding NC to the license limits the use to non-commercial use only. Another resource for selecting a license is choosealicense.com.

Licensing you data.

Choosing a Data Repository

A data repository is a digital archive collecting, preserving and displaying datasets, related documentation, and metadata.

There are more than 1500 data repositories available to archive your research data, how do you know which one to choose? Sometimes journals or funders recommend a repository, a discipline specific repository might be commonly used, or maybe your university recommends a repository. For example, at Tilburg University, Dataverse is recommended and the Research Data Office provides support depositing your data if desired.

Some well know repositories, besides Dataverse, are:

Trustworthy repositories should meet the following minimum criteria1:

  1. Provision of Persistent and Unique Identifiers (PIDs)
    • Allow data discovery and identification
    • Enable searching, citing, and retrieval of data
    • Provide support for data versioning
  2. Metadata
    • Enable finding of data
    • Enable referencing to related relevant information, such as other data and publications
    • Provide information that is publicly available and maintained, even for non-published, protected, retracted, or deleted data
    • Use metadata standards that are broadly accepted (by the scientific community)
    • Ensure that metadata are machine-retrievable
  3. Data access and usage licences
    • Enable access to data under well-specified conditions
    • Ensure data authenticity and integrity
    • Enable retrieval of data
    • Provide information about licensing and permissions (in ideally machine-readable form)
    • Ensure confidentiality and respect rights of data subjects and creators
  4. Preservation
    • Ensure persistence of metadata and data
    • Be transparent about mission, scope, preservation policies, and plans (including governance, financial sustainability, retention period, and continuity plan)

A well known certification for a trusted repository is for example CoreTrustSeal.

Publishing Your Research Data

If you want to make your data reusable for purposes beyond the one for which you collected them, you should publish your data.

Publishing your data is the act of publicly disclosing the research data you have collected, making them findable, accessible, interoperable and reusable (FAIR data).

There are multiple reasons to publish your data:

  • Data publication may lead to increased visibility, reuse and citation and therefore recognition of scholarly work.
  • Data archiving and publication has direct benefits for the research itself (more robust), for the discipline and for science in general by enabling new collaborations, new data uses and establishing links to the next generation of researchers.
  • The openness of research data is at the heart of scientific ethics.
  • External drivers like research data management policies from research funders and publishers might require data archiving and publication.
Tip

Be sure to archive/publish only data you are allowed to archive/publish. Often archives do not allow sensitive data or non-anonymized data to be archived. And if you want to share your data, make sure (if relevant) the participants agreed to this when you collected the data (consent).

See Also


This document was created with information provided by the Consortium of European Social Science Data Archives (CESSDA) and Kars Wijnhoven, Research Data Office at Tilburg University.

CESSDA Training Team (2017 - 2020). CESSDA Data Management Expert Guide. Bergen, Norway: CESSDA ERIC. Retrieved from cessda.eu.


  1. From Science Europe. ↩︎

Contributed by Pam Dupont