# Styling Bar Charts in ggplot2

10 mins

## Overview

This article is designed to guide you in the process of creating `bar charts` using `ggplot2`. Starting with the basics, we’ll guide you through each step to evolve a simple bar chart into one that meets academic standards.

Effective data visualization requires both accuracy and aesthetics, a balance that can be achieved with the use of the R package: `ggplot2`. This package is the best practice and a versatile tool for custom styling of your visualizations. We’ll explore customizing colors, themes, and labels, and provide practical tips for academic presentations. This includes handling standard errors, black-and-white formatting, and effective data grouping. Additionally, we’ll cover saving charts in high-quality PNG or PDF formats, ensuring they’re ready for academic publication.

Tip

Data Visualization Best Practices

Explore, our article on Data Visualization, here we describe the most common chart types and conclude with best practices for plotting. Also look at this article to explore the inner workings of `ggplot2`, called the “Grammar of Graphics”.

## Data Retrieval

In this article we will use the PIAAC dataset to examine the wage premium of obtaining a higher education level. We will illustrate the variance in wage across different education levels and later on between genders in the Netherlands. Let’s first load the required packages and download the data.

## Step-by-Step Guide: Crafting Publishable Bar Charts

### Step 1: Data Manipulation

A bar plot effectively displays the interaction between numerical and categorical variables.
The preparation of our dataset includes several steps:

• Using `mutate()` and `factor()` to order the categorical variable.
• Applying `group_by()` to group the data by the categorical variable, facilitating summary statistics.
• Calculating mean, standard deviation (SD), count (N), and standard error (SE) of the numerical variable for each group.

The code above transforms a specified categorical variable into a categorical data type, orders its levels, and then groups the data by this variable. It calculates summary statistics (mean, standard deviation, count, and standard error) for a numerical variable within each category (level). These steps prepare the dataset for creating a bar plot with error bars, useful for visualizing differences between categories.

Tip

Refactoring Categorical Variables Refactoring categorical variables is important for bar plot visualizations, as it allows you to decide in which order particular labels appear.

• Using the `fct_relevel()` function. This function allows for reordering factor levels using character strings, ensuring the categorical data is displayed in a preferred sequence.
• Utilizing Base R’s `factor()` Function: Our approach involves `factor()`, where the levels argument sets the desired order

### Step 2: Creating your first barchart

With the data in place, let’s visualize it using a bar chart. The code snippet below demonstrates how to create a bar chart with error bars using ggplot2:

Example

Key Points:

• `geom_col()` vs. `geom_bar()`: We use `geom_col()` here, as it’s suitable when bar heights need to directly represent data values. In contrast, `geom_bar()` is used when you want to count cases at each x position, as it employs `stat_count()` by default.
• Adding Error Bars: `geom_errorbar()` is utilized to add error bars to the bar chart. This function takes ymin and ymax aesthetics, calculated here as mean - se and mean + se, respectively. The width parameter controls the width of the error bars.
• Error bars are essential in reporting the “confidence” of particular estimates (e.g., whether they are closer or further away from zero).
• Aesthetics: The fill aesthetic within `aes()` is set to Categorical Variable to color the bars based on the categorical groups.

### Step 3: Enhancing Aesthetics

The initial visualization is a solid starting point, but to meet publication standards, we need to refine it further. The primary issues are:

• Redundant x-axis text: The legend duplicates information already conveyed by the x-axis labels.
• Non-Descriptive Axis Titles: Axis titles need to be more informative for clarity.
• Lack of Contextual Information: The plot lacks a title.
• Color Scheme: The default color palette does not meet the academic rigor.

Let’s address these issues with the following enhanced visualization:

Let’s go over these changes.

• `scale_fill_manual`: This function customizes bar colors to specified ones.
• `theme_minimal()`: It removes the grey background panel, creating a cleaner look.
• `scale_y_continuous`: Adjusts the y-axis scale.
• limits sets the y-axis limit. Since we still have to add the p-values, we need more space above the highest bar.
• expand sets the axis to start precisely at 0.
• `labs()` Function: Adds informative titles for axes, and the plot itself.

### Step 4: Enhancing Bar Chart Aesthetics with Theme Customization

After the initial setup, further customization using the `theme()` function in ggplot2 can significantly enhance the visual appeal and clarity of your bar chart. Here’s how you can apply these enhancements:

Now the barchart is closer to the academic standards. In the code the function `theme()` allows to adjust:

• Plot Title: Enhanced with a larger font size, boldface, and added bottom margin for clarity and spacing.
• Plot Margins: Uniform margins added around the plot to create balanced spacing.
• Axis Text and Titles: Changed text size and color for better readability; additional margins for x-axis text and removal of the y-axis title to simplify the plot.
• Legend Customization: Removed the redundant legend title and repositioned the legend to the top for a cleaner layout and easier comparison.

### Step 5: Visualizing Statistical Significance in Bar Charts

Suppose you’ve analyzed how education levels and gender influence mean hourly wages and found significant differences. To effectively communicate these results in your bar chart, it’s essential to visualize the statistical significance.

Tip

Automating statistical tests and visualization with `ggpubr` streamlines the process, making it more efficient, as explained here. However, customizing visualizations directly in `ggplot2` offers greater flexibility and control over the final output. Let’s dive into the latter.

#### Creating Data for Confidence Bounds

First, we need to set up data points to draw lines indicating significance. These lines typically connect the relevant categories with a central peak to denote significance.

This setup creates a line that starts at the center of the ‘Low’ bar at y = 20, peaks at y = 22 (indicating significance), travels across to the ‘Medium’ bar, and descends back to y = 20.

#### Adding Significance Lines and Annotations**

Next, we add these lines and annotations (like asterisks) to highlight the significance levels found in your statistical analysis.

In this code:

• `geom_line()` uses the p_value_one data to draw the line. Setting group = 1 ensures it’s treated as a continuous line.
• `annotate()` adds asterisks (***) at a specified point (here, x = “1.5”, y = 23.5) as a common symbol for statistical significance.
• `geom_text()` adds the mean value of each category in the plot.

Remember, the positions and labels are adjustable based on your specific data and results. Experiment with the x and y values in the annotate function to achieve the best placement in your bar chart. This approach provides a clear, customized way to denote significant findings in your visualization.

Tip

Transferring plots via copy-paste can degrade their quality, but our article on about saving plots in R provides a solution. It discusses how to use use `ggsave()` from the `ggplot2` package to save your visuals in high-quality formats. Discover how to maintain clarity in presentations and publications by customizing plot dimensions and resolutions.

## Advanced Techniques for Multi-Group Bar Charts in `ggplot2`

When working with `ggplot2` to visualize complex datasets with multiple categorical variables, creating faceted plots can pose unique challenges. This section focuses on the effective use of p-value annotations in such scenarios.

Visualizing Wage Differences by Education and Gender

Imagine plotting the mean hourly wage by education level, segmented by gender. Our goal is to highlight significant differences between education levels using p-values.

Step 1: Grouping the Data
Begin by organizing your data using `group_by()` to focus on the interaction between the chosen categories.

Step 2: Faceted Visualization with facet_wrap
Employ `facet_wrap()` to create distinct plots for each gender, allowing for a clear comparison across categories.

Step 3: Preparing P-Value Data for Annotations
To annotate your plot with p-values, prepare a separate dataset containing these values along with the corresponding coordinates and the faceting variable.

Step 4: Adding Annotations in ggplot2
Incorporate these annotations into your ggplot2 chart, ensuring they align accurately with the respective facets.

Summary

This article uses `ggplot2` for effective bar chart styling, crucial in academic data presentation.

• Bar charts in `ggplot2` are ideal for categorical data, showcasing groups and their quantitative measures.
• Key functions covered include `ggplot()` for initial plot creation and `geom_col()` for constructing bar charts.
• Advanced customization is achieved using `geom_errorbar() `for error bars and `scale_fill_manual()` for color themes.
• Showcases how to add p-values inside your `ggplot`.

Interested in the source code used in this analysis? Download it here.