Ideally, your data description should embody the comprehensive principles outlined in the “Datasheets for Datasets” by Gebru, Timnit, et al. (2018)1. We highly recommend referring to the original paper for an in-depth exploration of the seven essential components crucial for meticulous dataset documentation.
In the following section, we have replicated these pivotal questions. It’s advisable to incorporate them into a
readme.txt file alongside your datasets. In the case of derived data, it may suffice to reference a relevant source code file and provide a comprehensive list of variables along with their operational definitions.
You can download a formatted version (
.docx) of this template using the button below. Alternatively, you can find a plain text version of it for copy & paste below.
A Shorter Version
That’s a lot of documentation. So - if you don’t have time, go with the bigger picture and answer the main questions only.
========================================================== D A T A S E T D E S C R I P T I O N ========================================================== Name of the dataset: ---------------------------------------------------------- 1. Motivation of data collection (why was the data collected?) [...] 2. Composition of dataset (what's in the data?) [...] 3. Collection process (how was the data collected?) [...] 4. Preprocessing/cleaning/labeling (how was the data cleaned, if at all?) [...] 5. Uses (how is the dataset intended to be used?) [...] 6. Distribution (how will the dataset be made available to others?) [...] 7. Maintenance (will the dataset be maintained? How? by whom?) [...]