Overview
Using publicly available data from AirBnB (available via Kaggle.com), we illustrate how a reproducible workflow may look like in practice.
Check out our GitHub repository for all the details on how to clone the project and run it. Alternatively, continue reading below.
We’ve crafted this project to run:
- platform-independent (Mac, Linux, Windows)
- across a diverse set of software programs (Stata, Python, R)
- producing an entire (mock) paper, including modules that
- download data from Kaggle,
- prepare data for analysis,
- run a simple analysis,
- produce a paper with output tables and figures.
How to run it
Dependencies
-
Install Python.
- Anaconda is recommended. Download Anaconda.
- check availability: type
anaconda --version
in the command line.
-
Install Kaggle package.
- Kaggle API instruction for installation and setup.
-
Install Automation tools.
- GNU make: already installed in Mac and Linux OS. Download Make for Windows OS and install.
- Windows OS users only: make
Make
available via the command line.- Right Click on
Computer
- Go to
Property
, and clickAdvanced System Settings
- Choose
Environment Variables
, and choosePath
under the system variables, clickedit
- Add the bin of
Make
- Right Click on
- check availability: type
make --version
in the command line.
-
Install Stata.
- making Stata available via the command line. Instruction for adding Stata to path.
- check availability: type
$STATA_BIN --version
in the command line.
-
Install Perl.
- Perl is already installed in Mac and Linux OS. Download Perl for Windows OS.
- Make sure Perl available via the command line.
- check availability: type
perl -v
in the command line.
-
Install LyX.
- LyX is an open source document processor based on the LaTeX. Download LyX.
- make sure LyX available via the command line.
- check availability: type
$LYX_BIN
in the command line.
Run it
Open your command line tool:
-
Check whether your present working directory is
tisem-airbnb
by typingpwd
in terminal- if not, type
cd yourpath/tisem-airbnb
to change your directory totisem-airbnb
- if not, type
-
Type
make
in the command line.
Directory structure
Make sure makefile
is put in the present working directory. The directory structure for the Airbnb project is shown below.
├── data
├── gen
│  ├── analysis
│  │  ├── input
│  │  ├── output
│  │  │  ├── figure
│  │  │  ├── log
│  │  │  └── table
│  │  └── temp
│  ├── data_preparation
│  │  ├── audit
│  │  │  ├── figure
│  │  │  ├── log
│  │  │  └── table
│  │  ├── input
│  │  ├── output
│  │  │  ├── figure
│  │  │  ├── log
│  │  │  └── table
│  │  └── temp
│  └── paper
│  ├── input
│  ├── output
│  └── temp
└── src
├── analysis
├── data_preparation
└── paper
- gen: all generated files such as tables, figures, logs.
- Three parts: data_preparation, analysis, and paper.
- audit: put the resulting log/tables/figures of audit program. It has three sub-folders: figure, log, and table.
- temp : put the temporary files, such as some intermediate datasets. We may delete these filed in the end.
- output: put results, including the generated figures in sub-folder figure, log files in sub-folder log, and tables in sub-folder table.
- input: put all temporary input files
- data: all raw data.
- src: all source codes.
- Three parts: data_preparation, analysis, and paper (including TeX files).