A Reproducible Research Workflow with AirBnB Data

mins

Overview

Using publicly available data from AirBnB (available via Kaggle.com), we illustrate how a reproducible workflow may look like in practice.


We've crafted this project to run:

  • platform-independent (Mac, Linux, Windows)
  • across a diverse set of software programs (Stata, Python, R)
  • producing an entire (mock) paper, including modules that
    • download data from Kaggle,
    • prepare data for analysis,
    • run a simple analysis,
    • produce a paper with output tables and figures.

How to run it

Dependencies

  • Install Python.
    • Anaconda is recommended. Download Anaconda.
    • check availability: type anaconda --version in the command line.
  • Install Kaggle package.
    • Kaggle API instruction for installation and setup.
  • Install Automation tools.
    • GNU make: already installed in Mac and Linux OS. Download Make for Windows OS and install.
    • Windows OS users only: make Make available via the command line.
      • Right Click on Computer
      • Go to Property, and click Advanced System Settings
      • Choose Environment Variables, and choose Path under the system variables, click edit
      • Add the bin of Make
    • check availability: type make --version in the command line.
  • Install Stata.
    • making Stata available via the command line. Instruction for adding Stata to path.
    • check availability: type $STATA_BIN --version in the command line.
  • Install Perl.
    • Perl is already installed in Mac and Linux OS. Download Perl for Windows OS.
    • Make sure Perl available via the command line.
    • check availability: type perl -v in the command line.
  • Install LyX.
    • LyX is an open source document processor based on the LaTeX. Download LyX.
    • make sure LyX available via the command line.
    • check availability: type $LYX_BIN in the command line.

Run it

Open your command line tool:

  • Check whether your present working directory is airbnb-workflow by typing pwd in terminal
    • if not, type cd yourpath/airbnb-workflow to change your directory to airbnb-workflow
  • Type make in the command line.

Directory structure

Make sure makefile is put in the present working directory. The directory structure for the Airbnb project is shown below.

text
├── data
├── gen
│   ├── analysis
│   │   ├── input
│   │   ├── output
│   │   │   ├── figure
│   │   │   ├── log
│   │   │   └── table
│   │   └── temp
│   ├── data_preparation
│   │   ├── audit
│   │   │   ├── figure
│   │   │   ├── log
│   │   │   └── table
│   │   ├── input
│   │   ├── output
│   │   │   ├── figure
│   │   │   ├── log
│   │   │   └── table
│   │   └── temp
│   └── paper
│       ├── input
│       ├── output
│       └── temp
└── src
    ├── analysis
    ├── data_preparation
    └── paper
  • gen: all generated files such as tables, figures, logs.
    • Three parts: data_preparation, analysis, and paper.
    • audit: put the resulting log/tables/figures of audit program. It has three sub-folders: figure, log, and table.
    • temp : put the temporary files, such as some intermediate datasets. We may delete these filed in the end.
    • output: put results, including the generated figures in sub-folder figure, log files in sub-folder log, and tables in sub-folder table.
    • input: put all temporary input files
  • data: all raw data.
  • src: all source codes.
    • Three parts: data_preparation, analysis, and paper (including TeX files).