Longing to put your knowledge from our workflow guide into practice? Then follow this tutorial to implement a fully automated workflow to conduct sentiment analysis on tweets, using our GitHub workflow template.
Objectives of this tutorial
- Familiarize yourself with a robust directory structure for data-intensive projects
- Experience the benefits of automating workflows with makefiles/GNU make
- Learn to use Git templates for your own research projects
- Adjust the workflow template to
- ...download different datasets from the web
- ...unzip data automatically
- ...parse JSON objects and select relevant attributes
- ...add new text mining metrics to the final data set using Python's
textblob- ...modify the analysis in an RMarkdown/html document
Prerequisites
-
Computer setup following our setup instructions. - Python and the
textblobpackage<div class="highlight"><pre class="chroma"><code class="language-fallback" data-lang="fallback">pip install -U textblob</code></pre></div> Then, open Python in the terminal by typing `python`, and type <div class="highlight"><pre class="chroma"><code class="language-fallback" data-lang="fallback">import nltk nltk.download('punkt')</code></pre></div> If you receive an error message, please verify you are typing this command in python, and not *directly* in the terminal/Anaconda prompt. - <a href="/install/r" alt="R, RStudio">R, RStudio</a> and the following packages: <div class="highlight"><pre class="chroma"><code class="language-fallback" data-lang="fallback">install.packages(c("data.table", "knitr", "Rcpp", "ggplot2", "rmarkdown"))</code></pre></div> When installing the packages, R may ask you to select a "CRAN-Mirror". This is the location of the package repository from which R seeks to download the packages. Either pick `0-Cloud`, or manually choose any of the location nearest to your current geographical location.Warning
**R 4.0**. Newer versions of R (>=R 4.0) may require you to download additional packages.- If you're being asked whether to build these packages from source or not [options: yes/no], select NO. - If you're being asked to install RTools, please do follow these installation instructions.install.packages(c("rlang", "pillar")) -
Familiarity with our workflows, in particular on pipelines and project components, directory structure and pipeline automation.
-
Nice-to-haves: - Basic experience with Python and R - Familiarity with common data operations using
data.tablein R - Familiarity with text mining using Python and TextBlob - If you want to learn Git on the way... - Have Git installed on your computer (see here) - Have GitHub login credentials
Disclaimer
To keep this tutorial as accessible as possible, it will mention Git/GitHub a few times, but assume you will acquire details on these skills elsewhere. In other words, versioning and contributing to Git repositories is not part of this tutorial.