Web Scraping Using an Automated Browser
Sometimes when we scrape the web, we need to automate our computer to open a web browser to gather information from each page. This is especially true when the site we want to scrape has content that is loaded dynamically with javascript.
We will install one package to help us here: ChromeDriver. Below we show two different ways of installing it.
Install ChromeDriver
In order to install ChromeDriver, make sure you have already installed:
-
Selenium: by typing in the command
pip install selenium
.- Alternatively, open Anaconda Prompt (Windows) or the Terminal (Mac), type the command
conda install selenium
, and agree to whatever the package manager wants to install or update (usually by pressingy
to confirm your choice).
- Alternatively, open Anaconda Prompt (Windows) or the Terminal (Mac), type the command
-
Webdriver Manager for Python: by typing in the command
pip install webdriver_manager
Once you have obtained these packages, you can now install ChromeDriver as follows:
# Make selenium and chromedriver work for Untappd.com
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
#driver = webdriver.Chrome()
driver = webdriver.Chrome(ChromeDriverManager().install())
url = "https://untappd.com/"
driver.get(url)
Manually Installing ChromeDriver
If for any reason the prior did not work or if you simply prefer installing ChromeDriver manually, follow the operating-system-specific steps below.
Windows Users
Watch our YouTube video, in which we walk you through the setup on Windows.
- Install Google Chrome from here.
- Download the windows version of Chromedriver from here.
- Extract the contents from the zip file, and extract them into a new directory under
C:\chromedriver
. If you do not have admin rights, you can put the file also in another folder, for exampleC:\Program Files\chromedriver
, orC:\Users\[your-username]\chromedriver
. It does not matter where exactly the file will be put, as long as you remember where it is (it's not a good idea though to leave it in your downloads folder). - Make sure that the chromedriver.exe file is directly under the PATH you specified, i.e. under
C:\chromedriver
(or an alternative path). If your zip unpacker created a new folder with a different name inside your specified folder, move the .exe file to that path. - Add the directory
C:\chromedriver
(or whichever directory you chose above) to your PATH as described before (for instructions, see below) -
If this went successfully, open a terminal/command prompt, and enter
chromedriver --version
, you should get output that looks likeChromeDriver [version number]
Warning
Making
chromedriver
available via the PATH settings on Windows.We need to update our PATH settings; these settings are a set of directories that Windows uses to "look up" software to startup.
- Open the settings for environment variables
- Right-click on Computer.
- Go to "Properties" and select the tab "Advanced System settings".
- Choose "Environment Variables"
-
Alternatively, type "environment variable" (Dutch: omgevingsvariabelen) in your Windows 10 search menu, and press Enter.
-
Select
Path
from the list of user variables. ChooseEdit
. -
Windows 7 and 8 machines: If you chose your installation directory to be
C:\chromedriver
during your installation (i.e., you did use the default directory), copy and paste the following string without spaces at the start or end:`;C:\chromedriver`
-
Windows 10 machines:
-
Click
New
and paste the following string:`C:\chromedriver`
-
Click on
OK
as often as needed.
-
- Open the settings for environment variables
Mac Users
Let's install Homebrew first!
Make sure your Homebrew
package is up-to-date. To do so, open a terminal and enter
brew update
If that returns an error, Homebrew
is not installed.
- To install Homebrew, open a terminal and paste the following command:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"
- To verify that Homebrew installed correctly, enter the following into your terminal
brew doctor
...and you should see the following output
Your system is ready to brew
Sometimes, brew doctor
returns some warnings. While it's advisable to fix them (eventually), you typically don't have to do it to get started with Chromedriver - so just try to continue from here.
Let's proceed to installing Chromedriver
-
We assume you have Google Chrome installed. If not, do this first, please.
-
Install
chromedriver
via Homebrew:
brew install chromedriver --cask
- Verify your install, by entering the following in your terminal. The expected output is
ChromeDriver XX
chromedriver --version
Linux Users
- Open a terminal session
- Install Google Chrome for Debian/Ubuntu by pasting the following and then pressing
Return
sudo apt-get install libxss1 libappindicator1 libindicator7
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo dpkg -i google-chrome*.deb
sudo apt-get install -f
- Install
xvfb
so chrome can run 'headless' by pasting the following and then pressingReturn
sudo apt-get install xvfb
- Install Chromedriver by pasting the following and then pressing
Return
:
sudo apt-get install unzip
wget -N https://chromedriver.storage.googleapis.com/2.41/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
chmod +x chromedriver
sudo mv -f chromedriver /usr/local/share/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/local/bin/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/bin/chromedriver
- Your install worked, you should get
ChromeDriver XX
returned if the installation was successful
chromedriver --version