Overview
This building block covers the fixed effects regression assumptions for Ordinary Least Squares (OLS) models. These 4 assumptions should hold in a Fixed Effects regression model to establish the unbiasedness of OLS.
To refresh your understanding of panel data and fixed effects, you can refer to the panel data building block. For a comprehensive explanation of fixed effects regressions in R, check the fixest
building block.
The assumptions are taken from the online course Introduction to Econometrics with R.
In the fixed effects model
we assume the following:
- Zero conditional mean: $E(u_{it}|X_{i1},…,X_{iT}) = 0$.
- Observations across entities are i.i.d. draws from their joint distribution.
- Large outliers are unlikely.
- There is no perfect collinearity.
When there are multiple regressors, $X_{it}$ is replaced by $X_{1,it}, X_{2,it},…, X_{k,it}$.
Now, the assumptions are explained in greater detail.
Examples are given to clarify the assumptions. This is done with the panel data set called Grunfeld
, which is already used in the panel data building block and fixest
building block. Please refer to these building blocks for further context on the Grunfeld
data and the regression model.
Assumption 1: Zero Conditional Mean
This assumption is also called Strict Exogeneity. The error term $u_{it}$ has an expected value of zero given any values of the independent variables. In other words, $E(u_{it}|X_{i1},…,X_{it}) = 0$.
It states that the error term $u_{it}$ is uncorrelated with each explanatory variable for all entities (each i
) across all time periods (each t
).
Violating this assumption introduces an Omitted Variable Bias.
This is the Fixed Effects model with Grunfeld
data in which $\alpha_i$ denotes the firm fixed effects.
For assumption 1 to hold, the error term $\epsilon_{it}$ should be uncorrelated with each observation of the independent variables. This implies that all factors impacting invest
that are not included in the model are uncorrelated with all the observations of value
and capital
over time.
Assumption 2: Observations across entities are i.d.d. draws from their joint distribution
$(X_{i1}, X_{i2},…,X_{it}, u_{i1},…,u_{it})$, $i = 1,…,n$ are independent and identically distributed (i.i.d.) draws from their joint distribution.
This assumption is justified when entities are selected through simple random sampling, where each entity in the population has an equal chance of being included in the sample. This ensures that the observed data is representative of the entire population, and an OLS estimation can make valid inferences about the population based on the sample data.
It is important to note that this assumption is just about independence across entities. It does allow for observations to be correlated within entities, meaning that there can be autocorrelation within entities (firms) over time. This autocorrelation within an entity (over time) is commonly observed in time series and panel data. Similarly, the errors $u_{it}$ are allowed to be correlated over time.
In the context of the Grunfeld
data set, this assumption implies the following:
- In the Fixed Effects model, we assume that the variables
value
andcapital
are independent across different firms. The firmvalue
of U.S. Steel for example is assumed to be independent of the firmvalue
of General Motors. - Within each firm, there can still be autocorrelation over time. For instance, the firm
value
of U.S. Steel in one year may be dependent on itsvalue
in the previous year.
Assumption 3: Large outliers are unlikely
This assumption tells that $(X_{it}, u_{it})$ have nonzero finite fourth moments. In other words, the observations and error term have finite distributions.
Large outliers are extreme observations deviating considerably from the usual range of the data. In OLS, each observation is given equal weighting. Therefore, large outliers receive a relatively heavy weighting in the estimation of unknown regression coefficients. This can distort the estimates and lead to biased results.
By assuming the absence of large outliers, we expect that the observations will generally fall within a reasonable range and not have extreme values that would significantly affect the estimation of the regression coefficients. This assumption helps ensure the OLS estimates are unbiased.
Assumption 4: No perfect collinearity
Assumption 4 holds when there are no exact linear relationships among the independent variables. Perfect collinearity between two or more independent variables means that one variable can be written as a linear combination of the other(s). For example, one variable is a constant multiple of another. In the presence of perfect collinearity, the model cannot be estimated by OLS.
Note that the assumption does allow the independent variables to be correlated. They just cannot be perfectly correlated.
The Fixed Effects (FE) regression model relies on 4 key assumptions, which are discussed in this building block. These assumptions form the foundation for various models that account for different forms of error correlation over time.
Subsequent building blocks will explore these models commonly used to analyze panel data:
- Pooled OLS
- First-difference estimator
- Within estimator (Fixed effects)
- Between estimator
- Random effects