Overview

The estimatr package provides a range of commonly-used linear estimators that allow for easily computation of heteroscedasticity robust, cluster-robust, and other design appropriate standard error estimates

In this building block we will walk you through the process of estimating regression coefficients using the most commonly used functions in estimatr: lm_robust() and iv_robust(). We analyse data from the wage2 data set which is an in-built data set in R, provided by the wooldridge package. We are interested in examining how the education level of employees relates to their wages and to do so we use the following regression model:

$Wage_{i} = \beta_{0} + \beta_{1} Educ_{i} + \mu$

Variable	Description
`wage`	monthly earnings
`educ`	years of education

Load packages


install.packages("estimatr")
install.packages("wooldridge")

library(estimatr)
library(wooldridge)

data(wage2)

Model Estimation

Before diving into the estimatr package, let's first use the built-in lm() function to estimate our linear model,

$Wage_{i} = \beta_{0} + \beta_{1} Educ_{i} + \mu$


reg <- lm(data=wage2, wage~educ)
summary(reg)

Estimation results using lm()

The resulting regression estimates are prone to inefficiency because lm() function is specifically designed to fit linear models that assumes homoscedasticity, which may not always hold true in a specific application. estimatr provides an alternative that offers quick and easy ways to adjust standard errors, allowing for robust and clustered standard errors.

Heteroscedasticity Robust Estimates with `lm_robust()`

The lm_robust() function is used to get the robust standard errors from a linear regression model.

Let's re-estimate the model using robust standard errors.


lm_robust(data=wage2, wage~educ, se_type = "HC2")

Estimation results with HC2 standard errors

The se_type argument refers to the sort of standard error that one seeks for in the model. If nothing is specified, HC2 is used as a default option which includes a small sample correction to improve the accuracy of the standard errors.

Upon re-estimating the regression model with the HC2 option for heteroscedasticity-robust standard errors, we notice that the standard errors differ from the previous estimation. This discrepancy indicates that the model now accounts for the presence of heteroscedasticity, resulting in more efficient and reliable standard errors.

However, there are different se_types that are appropriate for different assumptions about the error terms and sample size, and one can choose accordingly from them:

classical: This option uses the classical or ordinary least squares (OLS) estimator to calculate standard errors. It assumes that the error terms are homoscedastic and uncorrelated with the independent variables.


lm_robust(data=wage2, wage~educ, se_type = "classical")

Estimate the model with classical standard errors

We can observe that this option gives us exactly the same result as we obtained with the lm() function.

HC0: This option uses the heteroscedasticity-consistent estimator to calculate standard errors. It allows for heteroscedasticity in the error terms, but does not correct for small sample bias.
HC1: This option uses the HC1 estimator to calculate standard errors. It is similar to the HC0 estimator, but includes a different sample correction than the HC2 estimator.
HC3: This option uses the HC3 estimator to calculate standard errors. It is similar to the HC2 estimator, but includes a more robust small sample correction that is less sensitive to outliers.

Cluster-Robust Standard Errors

Cluster-robust standard errors are designed to allow for correlation between observations within a cluster. For cluster-robust inference, estimatr provides cluster robust variance estimators: CR0 and CR2(default).

For illustrative purposes let's create an ID variable that will be used as a cluster variable.


install.packages("dplyr")  # If not already installed
library(dplyr)

# Creating an ID column using row_number()
wage2 <- wage2 %>%
  mutate(ID = row_number())

Now estimate the model with cluster-robust standard errors:


lm_robust(data=wage2, wage~educ, se_type = "CR2", cluster = wage2$ID)

Tip

Check out the mathematical notes for each of the estimators to better understand the formulas used to compute these standard errors to have a more granular understanding of the different use-cases of the different types.

Estimate with iv_robust()

The iv_robust function is used to estimate Instrumental Variable (IV) regressions with heteroscedasticity robust and cluster robust standard errors.

Suppose IQ is a potential instrument (i.e. it is correlated with education but not with the error term) we can obtain the heteroscedasticity robust standard errors as follows:


iv_robust(data=wage2, wage~educ|IQ, se_type = "HC2")

Estimate the IV regression with robust standard errors

Summary

estimatr is an R package for linear estimators designed for speed and ease-of-use.
Users can easily recover robust, cluster-robust, and other design-appropriate estimates, and options are provided to obtain standard errors that reflect heteroscedasticity.
The package includes among others, linear regression estimators like lm_robust() and iv_robust().
The standard errors can be adjusted using the se_type and cluster arguments.

Suggest changes to this page

Fixed Effects Regression Assumptions

A topic that covers the Fixed Effects Regression Assumptions

paneldata

panel

data

regression

model

assumptions

fixed

effects

Doing Calculations with Regression Coefficients Using deltaMethod

How to calculate the combined effects of regression coefficients using the deltaMethod package

Interaction Terms

Interpreting and coding interaction terms in R. Various scenarios will be discussed involving continuous and discrete independent variables, continuous and discrete dependent variables, with and without control variables. Specifically for the discrete variables, binary and multilevel interaction terms are explained.

binary

Easy Standard Error Adjustment with `estimatr`

Overview

Load packages

Model Estimation

Heteroscedasticity Robust Estimates with `lm_robust()`

Cluster-Robust Standard Errors

Estimate with iv_robust()

Related Posts

Fixed Effects Regression Assumptions

Doing Calculations with Regression Coefficients Using deltaMethod

Interaction Terms

Personalized Cookies

Easy Standard Error Adjustment with `estimatr`

Overview

Load packages

Model Estimation

Heteroscedasticity Robust Estimates with lm_robust()

Cluster-Robust Standard Errors

Estimate with iv_robust()

Related Posts

Fixed Effects Regression Assumptions

Doing Calculations with Regression Coefficients Using deltaMethod

Interaction Terms

Heteroscedasticity Robust Estimates with `lm_robust()`