Learning Goals
- Understand the essential guidelines to structure and maintain a well-organized project.
- Learn best practices for efficient and reproducible project workflows.
Overview
As projects progress, they can become disorganized and difficult to navigate. A structured approach not only facilitates collaboration and understanding but also ensures that the project remains efficient and reproducible.
This building block offers a comprehensive checklist to guide you towards achieving this goal.
Checklist
Project level
Foundational guidelines that are essential for setting up any project, ensuring clarity and effective organization from the outset:
- Implement a consistent directory structure:
data/src/gen
. - Include readme with project description and technical instructions on how to run/build the project.
- Store any authentication credentials outside of the repository (e.g., in a JSON file), and not in clear-text within the source code.
- Mirror your
/data
folder to a secure backup location. Alternatively, store all raw data on a secure server and download the relevant files to/data
.
Throughout the Pipeline
File/directory structure
Ensuring that your data, code, and results are systematically arranged, makes it easier to track changes and debug issues.
- Create subdirectory for source code:
/src/[pipeline-stage-name]/
. - Establish subdirectories within
/gen/[pipeline-stage-name]/
for generated files:temp
,output
, andaudit
. - Ensure file names are relative and not absolute. For instance, avoid references like
C:/mydata/myproject
, and opt for relative paths such as../output
. - Structure directories using your source code or use .gitkeep.
Automation & documentation
Ensuring smooth automation alongside clear documentation streamlines project workflows and aids clarity.
- Make sure to have a
makefile
to allow for automation. - Alternatively, include a readme with running instructions.
- Delineate dependencies between the source code and files-to-be-built explicitly. This allows
make
to automatically recognize when a rule is redundant, ensuring you define targets and source files properly. - Include a function to delete
temp
,output
files, andaudit
files in makefile when necessary.
Versioning
Versioning guarantees that changes in your project are trackable, providing a foundation for collaboration and recovery of previous work states.
- Track and version all source code stored in
/src
(e.g., add to Git/GitHub). - Do not version any files in
/data
and/gen
. They should not be added to Git/GitHub. -
If there are specific files or directories you wish to exclude, especially those unintentionally written to
/src
, utilize .gitignore to keep them unversioned.WarningDo not version sesitive data
Before making a GitHub repository public, we recommend you check that you have not stored any sensitive information in it, such as any passwords. You can use GitHub credentials scanner if you want to make sure.
Housekeeping
A tidy codebase is instrumental for collaborations and future adjustments. Proper housekeeping practices ensure code readability, maintainability, and efficient debugging.
- Opt for concise and descriptive variable names.
- Wherever possible, employ loops to reduce redundancy.
- Break down extensive source code into subprograms, functions, or divide them into smaller focused scripts.
- Prune unnecessary components such as redundant comments, outdated library calls, and unused variables.
- Implement asserts to stop program execution when encountering unhandled errors, ensuring robustness.
Testing for portability
Ensuring your project works across different environments and systems is crucial for consistent results and wider usability.
-
On Your Computer:
- Rebuild Test: Clear
/gen
and rebuild usingmake
. - Clone & Build: Clone to a new directory, then rebuild using
make
.
- Rebuild Test: Clear
-
Different Systems:
- Confirm functionality on Windows OS, Mac setup and Linux.
Example of a well-organized project
This tutorial covers the fundamental principles of project setup and workflows underlying this checklist. Under the Summary section, you will find a visual example of a well-structure project display.
To quickly visualize the structure of your project directories in a tree-like format, you can utilize the tree
command in your terminal or command prompt.
Additional Resources
- Tutorial about Pipeline Automation using Make.
- Free open-source Master level course on Data Preparation and Workflow Management.
- Reading about an example of a Digital Project Folder Structure.