library(wooldridge)
# gpa_model <- lm(gpa1$colGPA ~ gpa1$hsGPA)
# summary(gpa_model)
plot(gpa1$hsGPA, gpa1$colGPA,
xlab = "High school GPA",
ylab = "College GPA",
col = "red")
abline(lm(gpa1$colGPA ~ gpa1$hsGPA), col = "blue")
Topic 3: Simple Linear Regression
For a discrete random variable \(X\) with possible values \(x_i\) and probabilities \(p_i\):
\[\mathbb{E}[X] = \sum_{i=1}^n x_i p_i\]
For a continuous random variable \(X\) defined over \((-\infty, \infty)\) with probability density function \(f(x)\):
\[\mathbb{E}[X] = \int_{-\infty}^{\infty} \quad x \quad f(x) \quad dx\]
Note that in this course for the most part we will be using a single cross-sectional data
Simple linear regression model (SLRM): studies how \(y\) changes with changes in \(x\)
Population Model: \[y_i = β₀ + β₁x_i + u_i\]
Estimation from a sample gives: \[\hat{y_i} = \widehat{\beta_0} + \widehat{\beta_1} x_i\]
Note
Population Model: \[y_i = β₀ + β₁x_i + u_i\]
Example: \[wages_i = β₀ + β₁education_i + u_i\]
Population Model: \[y_i = β₀ + β₁x_i + u_i\]
Formally,
\[\frac{\partial y_i}{\partial x_i} = \beta_1 \] iff \(\frac{\partial u_i}{\partial x_i} = 0\)
Note
Zero Mean
Why This Works
Mean Independence
Key Implications
Original Model
\(wage_i = \beta_0 + \beta_1educ_i + u_i\)
Where \(u\) includes ability:
This means: \(wage = \beta_0 + \beta_1educ + 100 + \tilde{u}\)
Rewritten Model
\(wage_i = (\beta_0 + 100) + \beta_1educ_i + \tilde{u_i}\)
Where:
Key Insight
\[ wage_i = \beta_0 + \beta_1 education_i + u_i \]
Zero conditional mean implies:
\(E(y_i|x_i) = \beta_0 + \beta_1x_i\)
Interpretation
gpa1
data
\(E(colGPA|hsGPA) = 1.5 + 0.5 \, hsGPA\)
For \(hsGPA = 3.6\):
Systematic Part
Unsystematic Part
We start with two key assumptions:
This implies:
Population Model: \[y_i = β₀ + β₁x_i + u_i\]
Sample counterparts
\[n^{-1} \sum_{i=1}^{n}(y_i - \hat{\beta_0} - \hat{\beta_1}x_i) = 0\]
\[n^{-1} \sum_{i=1}^{n}x_i(y_i - \hat{\beta_0} - \hat{\beta_1}x_i) = 0\]
Now, \(n^{-1} \sum_{i=1}^{n}(y_i - \hat{\beta_0} - \hat{\beta_1}x_i) = 0\), simplifies to
\[\bar{y_i} = \hat{\beta_0} + \hat{\beta_1}\bar{x_i}\]
Where \(\bar{y_i} = n^{-1} \sum_{i=1}^{n}y_i\) is the sample average of the \(y_i\).
Thus,
\[\hat{\beta_0} = \bar{y_i} - \hat{\beta_1}\bar{x_i}\]
Plug in \(\hat{\beta_0}\) into: \(n^{-1} \sum_{i=1}^{n}x_i(y_i - \hat{\beta_0} - \hat{\beta_1}x_i) = 0\) and simplifying gives us the slope coefficient \(\hat{\beta_1}\):
\[\hat{\beta_1} = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2}\]
provided, \(\sum_{i=1}^{n}(x_i - \bar{x})^2 > 0\). What’s this?
Note that \(\hat{\beta_1} = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2}\) is nothing but
the sample covariance divided by the sample variance,
which can be written as,
\[\hat{\beta}_1=\hat{\rho}_{x y} \cdot\left(\frac{\hat{\sigma_y}}{\hat{\sigma_x}}\right)\]
since,
the line (characterized by an intercept and a slope) that minimizes: \[\sum_{i=1}^n \hat{u}_i^2= \underbrace{\sum_{i=1}^n\left(y_i-\hat{\beta}_0-\hat{\beta}_1 x_i\right)^2}_{\text{SSR: Squared Sum of Residuals}}\]
lead to FOC’s that are same as what we obtained as the sample counterparts of the moment conditions
that we used to derive the OLS estimates
OLS chooses \(\hat{\beta_0}\) and \(\hat{\beta_1}\) to minimize SSR
Let us implement in R and test with the lm package
Try testing these out with this generated data:
Total Sum of Squares (SST): \[ \text{SST} \equiv \sum_{i=1}^{n}(y_i - \bar{y})^2 \]
Explained Sum of Squares (SSE): \[ \text{SSE} \equiv \sum_{i=1}^{n}(\hat{y}_i - \bar{y})^2 \]
Residual Sum of Squares (SSR): \[ \text{SSR} \equiv \sum_{i=1}^{n}\hat{u}_i^2 \]
Relationship Between Sums of Squares
Total variation in \(y\) can be expressed as:
\[ \text{SST} = \text{SSE} + \text{SSR} \]
Proof Outline To prove this relationship, we can write:
\[ \sum_{i=1}^{n}(y_i - \bar{y})^2 = \sum_{i=1}^{n}[(\hat{y}_i - \bar{y}) + (y_i - \hat{y}_i)]^2 \] \[ = \sum_{i=1}^{n}(\hat{u}_i + (y_i - \bar{y}))^2 \] \[ = \text{SSR} + 2\sum_{i=1}^{n}\hat{u}_i(y_i - \bar{y}) + \text{SSE} \]
Specification | Change in x | Effect on y |
---|---|---|
Level-level | +1 unit | +\(b_1\) units |
Level-log | +1% | +\(\frac{b_1}{100}\) units |
Log-level | +1 unit | +\((100 \times b_1)\%\) |
Log-log | +1% | +\(b_1\%\) |
Note
Linear in linear regression means linear in parameters not variables
An estimator is unbiased if, on average, it produces estimates that are equal to the true value of the parameter being estimated.
Definition
An estimator producing an estimate \(\hat{\theta}\) for a parameter \(\theta\) is considered unbiased if: \[ \mathbb{E}(\widehat{\theta}) = \theta \]
Basically means that the sampling distribution of \(\hat{\theta}\) is centered around the true \(\theta\) on average
We will now show that the OLS estimator for a SLR model’s parameters \((\beta_0, \beta_1)\) is unbiased
Linear in parameters:
In the population model, the dependent variable, \(y\), is related to the independent variable, \(x\), and the error (or disturbance), \(u\), as: \[ y=\beta_0+\beta_1 x+u \] where \(\beta_0\) and \(\beta_1\) are the population intercept and slope parameters, respectively.
Random Sampling
We have a random sample of size n,\(\left\{\left(x_i, y_i\right): i=1,2, \ldots, n\right\}\) in the population model
Sample variation in X
The sample explanatory x variable, namely, \(x_i\) , \(i \in \{1, \cdots, n\}\) , are not all the same value \[\widehat{Var}(x_i) > 0\]
The error \(u\) has an expected value of zero given any value of the explanatory variable. In other words, \[E(u \mid x)=0\] .
Theorem: Unbiasedness of the OLS estimator
Using Assumptions SLR. 1 through SLR.4,
\[ \mathrm{E}\left(\hat{\beta}_0\right)=\beta_0 \text { and } \mathrm{E}\left(\hat{\beta}_1\right)=\beta_1 \]
The population error \(u_i\) has the same variance given any value of \(x_i\), i.e.,
\[ Var(u \mid x) = \sigma^2 \]
SLR 5 further implies that \(\sigma^2\) is also the unconditional variance of \(u\).
\[\operatorname{Var}(u \mid x)=\mathrm{E}\left(u^2 \mid x\right)-[\mathrm{E}(u \mid x)]^2\]
\[ \text { and since } \mathrm{E}(u \mid x)=0, \sigma^2=\mathrm{E}\left(u^2 \mid x\right)\]
using Assumptions SLR. 4 and SLR. 5 we can derive the conditional variance of \(y\) :
\[ \begin{gathered} \mathrm{E}(y \mid x)=\beta_0+\beta_1 x \\ \operatorname{Var}(y \mid x)=\sigma^2 \end{gathered} \]
From here we can get \(Var(u \mid x)\)
Sampling variance of OLS estimators
Under Assumptions SLR. 1 through SLR.5,
\[ \operatorname{Var}\left(\hat{\beta}_1\right)=\frac{\sigma^2}{\sum_{i=1}^n\left(x_i-\bar{x}\right)^2} = \frac{\sigma^2}{\mathrm{SST}_x}, \]
and
\[ \operatorname{Var}\left(\hat{\beta}_0\right)=\frac{\sigma^2 \frac{1}{n} \sum_{i=1}^n x_i^2}{\sum_{i=1}^n\left(x_i-\bar{x}\right)^2} = \frac{\sigma^2 \frac{1}{n} \sum_{i=1}^n x_i^2}{\mathrm{SST}_x} \]
where these are conditional on the sample values \(\left\{x_1, \ldots, x_n\right\}\).
\[\begin{align*} \operatorname{Var}\left(\hat{\beta}_1\right) & = \operatorname{Var}\left(\beta_1 + \frac{\sum_{i=1}^{n} d_iu_i}{SST_x}\right) \\ & = \operatorname{Var}\left(\frac{\sum_{i=1}^{n} d_iu_i}{SST_x}\right) \quad \text{skipping 2 steps} \\ & = \frac{1}{\left(SST_x\right)^2} \sum_{i=1}^{n} d_i^2 \operatorname{Var}(u_i) \quad \text{skipping 3 steps} \\ & = \frac{\sigma^2}{SST_x} \end{align*}\]
Unbiased estimation of \(\sigma^2\)
Under Assumptions SLR. through SLR.5,
\[\mathrm{E}\left(\hat{\sigma}^2\right)=\sigma^2\]
where \(\hat{\sigma}^2=\frac{1}{n-2} \sum_{i=1}^n \hat{u}_i^2\), where \(\hat{u}_i=y_i-\hat{\beta}_0-\hat{\beta}_1 x_i\)
Assuming SLR 1-5, and conditional on the sample values \(\left\{x_1, \ldots, x_n\right\}\), we have shown that:
next class we will start chapter 3