There is quite some material to cover to make sure your workflows become efficient, reproducible, and well-structured.
Here's a checklist you can use to audit your progress.
| ------------------------------------------------------------------------ | :--------------: | :-----------: | :-----------: | :-------: | |
Create subdirectory for source code: /src/[pipeline-stage-name]/ | ☐ | ☐ | ☐ | ☐ | |
| Create subdirectories for generated files in /gen/[pipeline-stage-name]/: temp, output, and audit. | ☐ | ☐ | ☐ | ☐ | |
| Make all file names relative, and not absolute (i.e., never refer to C:/mydata/myproject, but only use relative paths, e.g., ../output) | ☐ | ☐ | ☐ | ☐ | |
| Create directory structure from within your source code, or use .gitkeep | ☐ | ☐ | ☐ | ☐ | |
Have a makefile | ☐ | ☐ | ☐ | ☐ | |
| Alternatively, include a readme with running instructions | ☐ | ☐ | |||
| Make dependencies between source code and files-to-be-built explicit, so that make automatically recognizes when a rule does not need to be run (properly define targets and source files) | ☐ | ☐ | ☐ | ☐ | |
| Include function to delete temp, output files, and audit files in makefile | ☐ | ☐ | ☐ | ☐ | |
| Version all source code stored in /src (i.e., add to Git/GitHub) | ☐ | ☐ | ☐ | ☐ | |
Do not version any files in /data and /gen (i.e., do NOT add them to Git/GitHub) | ☐ | ☐ | ☐ | ☐ | |
| Want to exclude additional files (e.g., files that (unintentionally) get written to /src? Use .gitignore for files/directories that need not to be versioned | ☐ | ☐ | ☐ | ☐ | |
| Have short and accessible variable names | ☐ | ☐ | ☐ | ☐ | |
| Loop what can be looped | ☐ | ☐ | ☐ | ☐ | |
| Break down "long" source code in subprograms/functions, or split script in multiple smaller scripts | ☐ | ☐ | ☐ | ☐ | |
| Delete what can be deleted (including unnecessary comments, legacy calls to packages/libraries, variables) | ☐ | ☐ | ☐ | ☐ | |
| Use of asserts (i.e., make your program crash if it encounters an error which is not recognized as an error) | ☐ | ☐ | ☐ | ☐ | |
Tested on own computer (entirely wipe /gen, re-build the entire project using make) | ☐ | ☐ | ☐ | ☐ | |
| Tested on own computer (first clone to new directory, then re-build the entire project using make) | ☐ | ☐ | ☐ | ☐ | |
| Tested on different computer (Windows) | ☐ | ☐ | ☐ | ☐ | |
| Tested on different computer (Mac) | ☐ | ☐ | ☐ | ☐ | |
| Tested on different computer (Linux) | ☐ | ☐ | ☐ | ☐ | |
Versioned any sensitive data?
Before making a GitHub repository public, we recommend you check that you have not stored any sensitive information in it (such as any passwords). This tool has worked great for us: GitHub credentials scanner.