Principles of Project Setup and Workflow Management

50 mins

There is quite some material to cover to make sure your workflows become efficient, reproducible, and well-structured.

Here’s a checklist you can use to audit your progress.

	data-preparation	analysis	paper	…
At the project level
Implement a consistent directory structure: data/src/gen
Include readme with project description and technical instruction how to run/build the project
Store any authentication credentials outside of the repository (e.g., in a JSON file), NOT clear-text in source code
Mirror your `/data` folder to a secure backup location; alternatively, store all raw data on a secure server and download relevant files to `/data`

At the level of each stage of your pipeline
File/directory structure
Create subdirectory for source code: `/src/[pipeline-stage-name]/`	☐	☐	☐	☐
Create subdirectories for generated files in `/gen/[pipeline-stage-name]/`: `temp`, `output`, and `audit`.	☐	☐	☐	☐
Make all file names relative, and not absolute (i.e., never refer to C:/mydata/myproject, but only use relative paths, e.g., ../output)	☐	☐	☐	☐
Create directory structure from within your source code, or use .gitkeep	☐	☐	☐	☐
Automation and Documentation
Have a `makefile`	☐	☐	☐	☐
Alternatively, include a readme with running instructions	☐	☐
Make dependencies between source code and files-to-be-built explicit, so that `make` automatically recognizes when a rule does not need to be run (properly define targets and source files)	☐	☐	☐	☐
Include function to delete temp, output files, and audit files in makefile	☐	☐	☐	☐
Versioning
Version all source code stored in `/src` (i.e., add to Git/GitHub)	☐	☐	☐	☐
Do not version any files in `/data` and `/gen` (i.e., do NOT add them to Git/GitHub)	☐	☐	☐	☐
Want to exclude additional files (e.g., files that (unintentionally) get written to `/src`? Use .gitignore for files/directories that need not to be versioned	☐	☐	☐	☐
Housekeeping
Have short and accessible variable names	☐	☐	☐	☐
Loop what can be looped	☐	☐	☐	☐
Break down “long” source code in subprograms/functions, or split script in multiple smaller scripts	☐	☐	☐	☐
Delete what can be deleted (including unnecessary comments, legacy calls to packages/libraries, variables)	☐	☐	☐	☐
Use of asserts (i.e., make your program crash if it encounters an error which is not recognized as an error)	☐	☐	☐	☐
Testing for portability
Tested on own computer (entirely wipe `/gen`, re-build the entire project using `make`)	☐	☐	☐	☐
Tested on own computer (first clone to new directory, then re-build the entire project using `make`)	☐	☐	☐	☐
Tested on different computer (Windows)	☐	☐	☐	☐
Tested on different computer (Mac)	☐	☐	☐	☐
Tested on different computer (Linux)	☐	☐	☐	☐

Warning

Versioned any sensitive data?

Before making a GitHub repository public, we recommend you check that you have not stored any sensitive information in it (such as any passwords). This tool has worked great for us: GitHub credentials scanner.

Suggest changes to this page

Go back

Previous Collaborating using GitHub

Use Makefiles to Manage, Automate, and Reproduce Projects

Learn how to use makefiles to establish defined protocols and strategies for your computational workflows.

make

makefile

automation

recipes

Project Setup Overview

When working on a project, most of us spend time thinking about what to create (a cleaned data set, a new algorithm, an analysis, a paper and corresponding slides), but not about how to manage its creation.

principles

workflow

setup

project

A Simple Reproducible Research Workflow

A simple Make pipeline with R and LaTeX.

simple

LaTeX

workflow

Principles of Project Setup and Workflow Management

Project Setup Overview

Pipelines and Project Components

Data Management and Directory Structure

Automating your Pipeline

Documenting Datasets

Documenting Source Code and Pipeline Workflows

Versioning using Git and GitHub

Collaborating using GitHub

Checklist to Audit Data- and Computation-intensive Projects

Related Posts

Use Makefiles to Manage, Automate, and Reproduce Projects

Project Setup Overview

A Simple Reproducible Research Workflow

Principles of Project Setup and Workflow Management

Related Posts

Use Makefiles to Manage, Automate, and Reproduce Projects

Project Setup Overview

A Simple Reproducible Research Workflow

Google Analytics (functional)

Google Tag Manager (functional)