- Understand the essential guidelines to structure and maintain a well-organized project.
- Learn best practices for efficient and reproducible project workflows.
As projects progress, they can become disorganized and difficult to navigate. A structured approach not only facilitates collaboration and understanding but also ensures that the project remains efficient and reproducible.
This building block offers a comprehensive checklist to guide you towards achieving this goal.
Foundational guidelines that are essential for setting up any project, ensuring clarity and effective organization from the outset:
- Implement a consistent directory structure:
- Include readme with project description and technical instructions on how to run/build the project.
- Store any authentication credentials outside of the repository (e.g., in a JSON file), and not in clear-text within the source code.
- Mirror your
/datafolder to a secure backup location. Alternatively, store all raw data on a secure server and download the relevant files to
Throughout the Pipeline
Ensuring that your data, code, and results are systematically arranged, makes it easier to track changes and debug issues.
- Create subdirectory for source code:
- Establish subdirectories within
/gen/[pipeline-stage-name]/for generated files:
- Ensure file names are relative and not absolute. For instance, avoid references like
C:/mydata/myproject, and opt for relative paths such as
- Structure directories using your source code or use .gitkeep.
Automation & documentation
Ensuring smooth automation alongside clear documentation streamlines project workflows and aids clarity.
- Make sure to have a
makefileto allow for automation.
- Alternatively, include a readme with running instructions.
- Delineate dependencies between the source code and files-to-be-built explicitly. This allows
maketo automatically recognize when a rule is redundant, ensuring you define targets and source files properly.
- Include a function to delete
auditfiles in makefile when necessary.
Versioning guarantees that changes in your project are trackable, providing a foundation for collaboration and recovery of previous work states.
- Track and version all source code stored in
/src(e.g., add to Git/GitHub).
- Do not version any files in
/gen. They should not be added to Git/GitHub.
- If there are specific files or directories you wish to exclude, especially those unintentionally written to
/src, utilize .gitignore to keep them unversioned.
Do not version sesitive data
Before making a GitHub repository public, we recommend you check that you have not stored any sensitive information in it, such as any passwords. You can use GitHub credentials scanner if you want to make sure.
A tidy codebase is instrumental for collaborations and future adjustments. Proper housekeeping practices ensure code readability, maintainability, and efficient debugging.
- Opt for concise and descriptive variable names.
- Wherever possible, employ loops to reduce redundancy.
- Break down extensive source code into subprograms, functions, or divide them into smaller focused scripts.
- Prune unnecessary components such as redundant comments, outdated library calls, and unused variables.
- Implement asserts to stop program execution when encountering unhandled errors, ensuring robustness.
Testing for portability
Ensuring your project works across different environments and systems is crucial for consistent results and wider usability.
On Your Computer:
- Rebuild Test: Clear
/genand rebuild using
- Clone & Build: Clone to a new directory, then rebuild using
- Rebuild Test: Clear
- Confirm functionality on Windows OS, Mac setup and Linux.
Example of a well-organized project
This tutorial covers the fundamental principles of project setup and workflows underlying this checklist. Under the Summary section, you will find a visual example of a well-structure project display.
To quickly visualize the structure of your project directories in a tree-like format, you can utilize the
tree command in your terminal or command prompt.