Web Scraping Using an Automated Browser

Sometimes when we scrape the web, we need to automate our computer to open a web browser to gather information from each page. This is especially true when the site we want to scrape has content that is loaded dynamically with javascript.

We will install one package to help us here: Chromedriver.

Installing this stuff is operating-system specific, hence so are the instructions below.

Windows Users

Watch our YouTube video, in which we walk you through the setup on Windows.

  • Install Google Chrome from here
  • Download the windows version of Chromedriver from here.
  • Extract the contents from the zip file, and extract them into a new directory under C:\chromedriver. If you do not have admin rights, you can put the file also in another folder, for example C:\Program Files\chromedriver, or C:\Users\[your-username]\chromedriver. It does not matter where exactly the file will be put, as long as you remember where it is (it's not a good idea though to leave it in your downloads folder).
  • Make sure that the chromedriver.exe file is directly under the PATH you specified, i.e. under C:\chromedriver (or an alternative path). If your zip unpacker created a new folder with a different name inside your specified folder, move the .exe file to that path.
  • Add the directory C:\chromedriver (or whichever directory you chose above) to your PATH as described before (for instructions, see below)
  • If this went successfully, open a terminal/command prompt, and enter chromedriver --version, you should get output that looks like ChromeDriver [version number]

Making chromedriver available via the PATH settings on Windows

We need to update our PATH settings; these settings are a set of directories that Windows uses to "look up" software to startup.

  • Open the settings for environment variables
    • Right-click on Computer.
    • Go to "Properties" and select the tab "Advanced System settings".
    • Choose "Environment Variables"
  • Alternatively, type "environment variable" (Dutch: omgevingsvariabelen) in your Windows 10 search menu, and press Enter.

  • Select Path from the list of user variables. Choose Edit.

    • Windows 7 and 8 machines: If you chose your installation directory to be C:\chromedriver during your installation (i.e., you did use the default directory), copy and paste the following string without spaces at the start or end:

      ;C:\chromedriver

    • Windows 10 machines:

      • Click New and paste the following string:

      C:\chromedriver

      • Click on OK as often as needed.

Mac Users

Let's install homebrew first!

Make sure your homebrew package is up-to-date. To do so, open a terminal and enter

brew update

If that returns an error, homebrew is not installed.

  • To install Homebrew, open a terminal and paste the following command:

    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"

  • To verify that Homebrew installed correctly, enter the following into your terminal

    brew doctor

    ...and you should see the following output

    Your system is ready to brew

Let's proceed to installing Chromedriver

  • We assume you have Google Chrome installed. If not, do this first, please.

  • Install chromedriver via homebrew:

brew cask install chromedriver
  • Verify your install, by entering the following in your terminal. The expected output is ChromeDriver XX
chromedriver --version

Linux Users

  • Open a terminal session
  • Install Google Chrome for Debian/Ubuntu by pasting the following and then pressing Return
sudo apt-get install libxss1 libappindicator1 libindicator7
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb

sudo dpkg -i google-chrome*.deb
sudo apt-get install -f
  • Install xvfb so chrome can run 'headless' by pasting the following and then pressing Return
sudo apt-get install xvfb
  • Install Chromedriver by pasting the following and then pressing Return:
sudo apt-get install unzip

wget -N https://chromedriver.storage.googleapis.com/2.41/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
chmod +x chromedriver

sudo mv -f chromedriver /usr/local/share/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/local/bin/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/bin/chromedriver
  • Your install worked, you should get ChromeDriver XX returned if the installation was successful
chromedriver --version