# Run a Regression Analysis

3 mins

## Overview

In the social sciences, regression analysis is a popular tool to estimate relationships between a dependent variable and one or more independent variables. It is a way to find trends in data, quantify the impact of input variables, and make predictions for unseen data.

In this topic, we illustrate how to estimate a model, identify outliers, plot a trend line, and make predictions.

## Code

### Estimate the Model

Linear regression (`lm`) is suitable for a response variable that is numeric. For logical values (e.g., did a customer churn: yes/no), you need to estimate a logistic regression model (`glm`). The code sample below estimates a model, checks the model assumptions, and shows the regression coefficients.

• Model transformations can be incorporated into the formula. For example: `formula = log(y) ~ I(x^2)`.
• The coefficients (`coefficients(mdl)`), predictions for the original data set (`fitted(mdl)`), and residuals (`residuals(mdl)`) can be directly derived from the model object.
• A concrete example of how to evaluate model assumptions (mean of the residuals is 0, residuals are normally distributed, homoscedasticity) can be found in this topic.

### Identify Outliers

Compute the leverage of your data records and influence on `mdl` to identify potential outliers.

### Plot a Trend Line

Plot a scatter plot of two numeric variables and add a linear trend line on top of it.

### Make Predictions

Given a linear regression model (`mdl`), make predictions for unseen input data (`explanatory_data`). Note that for multiple linear regression models, you need to pass an `explanatory_data` object with multiple columns.

### Export Model Output

You can export your model output using `stargazer`. This package will create a nicely-formatted regression table for you in a variety of formats. You can learn more about it here.

Convert regression coefficients of `mdl_1` and `mdl_2` into a HTML file that can be copied into a paper.

### Exporting your findings in Stata

Alternatively, you can do your regression analysis on Stata. First, you should clear your working directory. With the command “sysuse auto”, we download an example data file provided by Stata itself.

Example

This cars example outlines how to run, evaluate, and export regression model results for the `cars` dataset. In particular, it analyzes the relationship between a carâ€™s speed and the stop distance.