Tutorial to Implement an Efficient and Reproducible Workflow
Longing to put your knowledge from our workflow guide into practice? Then follow this tutorial to implement a fully automated workflow to conduct sentiment analysis on tweets, using our GitHub workflow template.
Objectives of this tutorial
- Familiarize yourself with a robust directory structure for data-intensive projects
- Experience the benefits of automating workflows with makefiles/GNU make
- Learn to use Git templates for your own research projects
- Adjust the workflow template to
- ...download different datasets from the web
- ...unzip data automatically
- ...parse JSON objects and select relevant attributes
- ...add new text mining metrics to the final data set using Python's
- ...modify the analysis in an RMarkdown/html document
Computer setup following our setup instructions
Python and the
pip install -U textblob
Then, open Python (
python) and type
import nltk nltk.download('punkt')
If you receive an error message, please verify you are typing this command in python (opened on the terminal by typing
python), and not directly in the terminal/Anaconda prompt.
R, RStudio and the following packages:
install.packages(c("data.table", "knitr", "Rcpp", "ggplot2", "rmarkdown"))
When installing the packages, R may ask you to select a "CRAN-Mirror". This is the location of the package repository from which R seeks to download the packages. Either pick
0-Cloud, or manually choose any of the location nearest to your current geographical location.
Newer versions of R (>=R 4.0) may require you to download additional packages.
If you're being asked whether to build these packages from source or not [options: yes/no], select NO.
If you're being asked to install RTools, please do follow these installation instructions.
- Basic experience with Python and R
- Familiarity with common data operations using
- Familiarity with text mining using Python and TextBlob
- If you want to learn Git on the way...
- Have Git installed on your computer (see here)
- Have GitHub login credentials
To keep this tutorial as accessible as possible, it will mention Git/GitHub a few times, but assume you will acquire details on these skills elsewhere. In other words, versioning and contributing to Git repositories is not part of this tutorial.