Plotting with Matplotlib and Seaborn in Python

Overview

Python has a lot of libraries for visualizing data, out of which matplotlib and seaborn are the most common. In this building block we construct the plots defined in Data Visualization Theory and Best Practices with both matplotlib and seaborn.

Setup

To install matplotlib follow this guide. This is the base library for plotting in Python.

Tip

You can also plot with pandas, which is built on top of matplotlib.

To install seaborn follow this guide. This is also built on top of matplotlib to create statistical plots.

Let's first import the libraries.

python


import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

We are going to use two datasets, the Iris dataset and the Monthly stocks dataset, containing closing prices of 4 companies over time. Let's load the datasets.

python


iris = pd.read_csv('iris.csv')
stocks = pd.read_csv('stocks-monthly.csv',parse_dates=[0])

Gallery of Plots

1. Scatterplot

Matplotlib

Creating a scatterplot with matplotlib is simple, we just need to follow a simple syntax. For this plot type we use the Iris dataset.

python


#create the scatterplot using two quantitative attributes
plt.scatter(iris['sepal width'], iris['sepal length'])

#name the X axis
plt.xlabel('Sepal width')

#name the Y axis
plt.ylabel('Sepal length')

#name the plot
plt.title("Scatterplot")

#add gridlines
plt.grid()

The scatterplot visualizes the sepal width on the X axis and the sepal length on the Y axis. The plot shows us that the majority of the points are concentrated around the center denoting that in general the flowers, regardless of their species, have a medium sepal length and width.

Output:

Tip

We can also change the color of the dots by adding the parameter c = #some color. We can see all supported colors in matplotlib by running mpl.colors.cnames.

Additionally, we can change the style of the markers (dots) by adding the parameter marker = #some marker. We can see all supported marker styles in matplotlib by running mpl.markers.MarkerStyle.markers.

Seaborn

Creating the same scatterplot in seaborn is easy. Additionally, it can take the categorical variable of flower species as parameter for color hue. This way, each species has a different color and is easier to identify.

python


sns.scatterplot(iris['sepal width'], iris['sepal length'], hue = iris['species']).set(title="Scatterplot")

Output:

2. Bar plot

Matplotlib

For the bar plot we use the Monthly stock dataset. We visualize the months on the X axis and closing prices of one company on the Y axis.

python


#create the bar plot using the months and closing prices of Google
plt.bar(stocks['Date'].dt.month, stocks['GOOG'])

#add legend
plt.legend(['GOOG'])

#name the X axis
plt.xlabel('Month')

#name the Y axis
plt.ylabel('Closing price')

#name the plot
plt.title("Bar plot")

#add gridlines
plt.grid()

Output:

Seaborn

When plotting with seaborn it automatically adds a different color for each bar, as well as add error bars. They represent the uncertainty or variation of the corresponding coordinate of the point.

python


sns.barplot(stocks['Date'].dt.month, stocks['GOOG']).set(title="Bar plot")

Output:

3. Stacked bar chart

Matplotlib

python


#add each categorical variable (company) with a different color
plt.bar(stocks['Date'].dt.month, stocks['GOOG'], color='r')
plt.bar(stocks['Date'].dt.month, stocks['MSFT'], bottom=stocks['GOOG'], color='b')
plt.bar(stocks['Date'].dt.month, stocks['AMZN'], bottom=stocks['GOOG']+stocks['MSFT'], color='y')
plt.bar(stocks['Date'].dt.month, stocks['NASDAQ'], bottom=stocks['GOOG']+stocks['MSFT']+stocks['AMZN'], color='g')

#name the X axis
plt.xlabel("Months")

#name the Y axis
plt.ylabel("Closing price")

#add legend
plt.legend(["GOOG", "MSFT", "AMZN", "NASDAQ"])

#add title
plt.title("Closing prices of stocks in each month")

#add limit for Y axis to better visualize all categories
plt.ylim(0,800)

Output:

seaborn doesn't have a direct function for plotting stacked bar charts. An alternative is to create it using the pandas library following this syntax: DataFrameName.plot(kind='bar', stacked=True, color=[.....])

4. Line chart

Seaborn

When plotting line charts with seaborn we have to specify exactly what to visualize on the axes:

python


sns.lineplot(data = stocks, x = 'Date', y = 'NASDAQ').set(title="Line plot")

Output:

Matplotlib & Pandas

We can use a simple command to plot all 4 companies in the same line plot:

python


#we first set the date column as index
stocks_d = stocks.set_index('Date')

#create line plot with title
stocks_d.plot()
plt.title("Stock prices over time")

Output:

Subplotting

We can also create several subplots under the same figure. For instance, we create one line plot for each company.

python


#create display of figure
fig, ax = plt.subplots(nrows=2, ncols=2, squeeze=False, sharex=True, figsize=(10,10))

#plot each company on a different position in the figure
stocks_d['NASDAQ'].plot(ax=ax[0, 0])
stocks_d['AMZN'].plot(ax=ax[0, 1])
stocks_d['GOOG'].plot(ax=ax[1, 0])
stocks_d['MSFT'].plot(ax=ax[1, 1])

#set titles for each subplot
ax[0, 0].set_title('NASDAQ')
ax[0, 1].set_title('AMZN')
ax[1, 0].set_title('GOOG')
ax[1, 1].set_title('MSFT')
ax[0, 0].set_ylabel('Price (USD)')
ax[1, 0].set_ylabel('Price (USD)')

#set title of whole figure
fig.suptitle("Development of stocks over time", size=18, weight='bold')

Output:

5. Heatmap

Before actually creating the heatmap, we need to rearrange the data to create a pivot table. We use the Iris dataset to create the pivot table after the petal length and width levels.

python


levels = ["tiny", "small", "medium", "big", "large"]
iris["petal width level"] = pd.cut(iris["petal width"], len(levels), labels=levels)
iris["petal length level"] = pd.cut(iris["petal length"], len(levels), labels=levels)

iris_grouped = iris.groupby(["petal width level", "petal length level"]).count().reset_index()

# fill the NaN values with 0's
iris_grouped["count"] = iris_grouped["species"].fillna(0)

# pivot the table
iris_matrix = iris_grouped.pivot("petal width level", "petal length level", "count")

# pivot orders the levels alphabetically, so reorder them according to the order in the 'levels' variable
iris_matrix = iris_matrix.reindex(levels, axis=0);
iris_matrix = iris_matrix.reindex(levels, axis=1);

iris_matrix

We can now create the heatmap from the new matrix.

Matplotlib

python


plt.imshow(iris_matrix)
plt.colorbar()
plt.title("Heatmap with color bar")

Output:

Seaborn

python


sns.heatmap(iris_matrix, square=True).set(title="Heatmap with color bar")

Output:

6. Histogram

For the histogram we use seaborn since it is the best library for statistical plotting.

python


#we can create a more complex chart that contains the histogram, the density plot and the normal distribution

from scipy.stats import norm
sns.distplot(iris['petal length'], fit=norm).set(title="Histogram with normal distribution")

Output:

The blue line represents the density plot and the black line is the fitted normal distribution

7. Box plot

We can visualize the distribution of petal length for each iris species with the box plot.

Matplotlib

python


iris.boxplot(column = 'petal length', by = 'species', figsize = (5,5))

Output:

Seaborn

python


sns.boxplot(data=iris, x='species', y='petal length').set(title="Box plot of petal length")

Output:

Contributed by Ana Bianca Luca

Suggest changes to this page

Dynamic Data Storytelling with Plotly

Exploration and guidance in the art of creating interactive, narrative-driven data visualizations by using Plotly in R

data visualization

Plotly

interactive charts

dynamic storytelling

ggplot2

data analysis

Grammar of Graphics of ggplot2

Understand the inner workings of ggplot2 - a powerful plotting library for R.

ggplot2

Grammar of Graphics

Layering

Data Visualization

Plot Saving in R: Techniques and Best Practices

Explore techniques for saving R plots using ggsave from the ggplot2 package, including dynamic file naming, version control, and directory management, to improve project organization and file management.

ggsave

ggplot2

directory management

Personalized Cookies

Plotting with Matplotlib and Seaborn in Python

Overview

Setup

Gallery of Plots

1. Scatterplot

Seaborn

2. Bar plot

Matplotlib

Seaborn

3. Stacked bar chart

Matplotlib

4. Line chart

Seaborn

Matplotlib & Pandas

Subplotting

5. Heatmap

Matplotlib

Seaborn

6. Histogram

7. Box plot

Matplotlib

Seaborn

Related Posts

Dynamic Data Storytelling with Plotly

Grammar of Graphics of ggplot2

Plot Saving in R: Techniques and Best Practices