# Run a Regression Analysis

## Overview

In the social sciences, regresion analysis is a popular tool to estimate relationships between a dependent variable and one or more independent variables. It is a way to find trends in data, quantify the impact of input variables, and make predictions for unseen data.

In this building block, we illustrate how to estimate a model, identify outliers, plot a trend line, and make predictions.

## Code

### Estimate Model

Linear regression (`lm`) is suitable for a response variable that is numeric. For logical values (e.g., did a customer churn: yes/no), you need to estimate a logistic regression model (`glm`). The code sample below estimates a model, checks the model assumptions, and shows the regression coefficients.

• Model transformations can be incorporated into the formula, for example: `formula = log(y) ~ I(x^2)`.
• The coefficients (`coefficients(mdl)`), predictions for the original data set (`fitted(mdl)`), and residuals (`residuals(mdl)`) can be directly derived from the model object.
• A concrete example on how to evaluate model assumptions (mean residuals is 0, residuals are normally distributed, homskedascticiy) can be found here.

### Identify Outliers

Compute the leverage of your data records and influence on `mdl` to identify potential outliers.

### Plot Trend Line

Plot a scatter plot of two numeric variables and add a linear trend line on top of it.

### Make Predictions

Given a linear regression model (`mdl`), make predictions for unseen input data (`explanatory_data`). Note that for multiple linear regression models, you need to pass an `explanatory_data` object with multiple columns.

### Export Model Output

You can export your model output using `stargazer`. This package will create a nicely-formatted regression table for you in a variety of formats. You can learn more about it here.

Convert regression coefficients of `mdl_1` and `mdl_2` into a HTML file that can be copied into a paper. Example

This tutorial outlines how to run, evaluate, and export regression model results for the `cars` dataset. In particular, it analyzes the relationship between a car’s speed and the stop distance. 