Econ 265: Introduction to Econometrics

Topic 4: Multiple Linear Regression

Moshi Alam

Introduction

  • The idea here would be extend the concepts discussed simple linear regression
  • Again, it is crucial that you make yourself comfortable with the concepts of simple linear regression before moving on here
  • Consequently, we will be much faster than the previous topic
  • Once again the big picture will be to extend the assumptions revolving around \(u_i\) and \(x_i\) to multiple variables
    • to understand how does the OLS estimator plays out

Multiple Linear Regression

  • The multiple linear regression model is given by: \[y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \ldots + \beta_k x_{ki} + u_i\]

    • \(i = 1, 2, \ldots, n\) indexes the observations
    • \(y_i\) is the dependent variable,
    • \(x_{1i}, x_{2i}, \ldots, x_{ki}\) are the independent variables,
    • \(\beta_0, \beta_1, \ldots, \beta_k\) are the parameters
    • \(u_i\) is the error term
  • The key assumption will be a generalized versions in SLR \[E(u_i|x_{1i}, x_{2i}, \ldots, x_{ki}) = 0\] \[E(u_i) = 0\]

  • These imply: \(E(x_{1i}u_i) = E(x_{2i}u_i) = \ldots = E(x_{ki}u_i) = 0\)

Obtaining the OLS estimates

  • Once again we will minimize the sum of squared residuals to estimate the parameters as \(\hat{\beta}_0, \hat{\beta}_1, \ldots, \hat{\beta}_k\)
  • Specifically, we will choose \(\hat{\beta}_0, \hat{\beta}_1, \ldots, \hat{\beta}_k\) to minimize the following: \[\sum_{i=1}^n (y_i - \hat{y}_i)^2 = \sum_{i=1}^n (y_i - \hat{\beta}_0 - \hat{\beta}_1 x_{1i} - \hat{\beta}_2 x_{2i} - \ldots - \hat{\beta}_k x_{ki})^2\]

This minimization problem leads to \(k+1\) linear equations (F.O.C.s) in \(k+1\) unknowns \(\hat{\boldsymbol{\beta}}_0, \hat{\boldsymbol{\beta}}_1, \ldots, \hat{\boldsymbol{\beta}}_k\) :

\[ \begin{aligned} & \sum_{i=1}^n\left(y_i-\hat{\beta}_0-\hat{\beta}_1 x_{i 1}-\cdots-\hat{\beta}_k x_{i k}\right)=0 \\ & \sum_{i=1}^n x_{i 1}\left(y_i-\hat{\beta}_0-\hat{\beta}_1 x_{i 1}-\cdots-\hat{\beta}_k x_{i k}\right)=0 \\ & \sum_{i=1}^n x_{i 2}\left(y_i-\hat{\beta}_0-\hat{\beta}_1 x_{i 1}-\cdots-\hat{\beta}_k x_{i k}\right)=0 \\ & \vdots \\ & \sum_{i=1}^n x_{i k}\left(y_i-\hat{\beta}_0-\hat{\beta}_1 x_{i 1}-\cdots-\hat{\beta}_k x_{i k}\right)=0 \end{aligned} \]

The FOCs

  • The FOCs give us \(k+1\) equations in \(k+1\) unknowns
  • Note that we arrive at this because we assume that any change in \(x_{1i}\), \(x_{2i}\), \(\ldots\), \(x_{ki}\) will not affect the error term \(u_i\)
  • Gives us the sample counterparts of \(k+1\) moment conditions
    • Recall we did something similar in SLR
  • The moment conditions are given by: \[\begin{aligned} E(u_i) &= 0 \\ E(x_{1i}u_i) &= 0 \\ E(x_{2i}u_i) &= 0 \\ \vdots \\ E(x_{ki}u_i) &= 0 \end{aligned}\]
  • Note that we arrive at this because we assume that any change in \(x_{1i}\), \(x_{2i}\), \(\ldots\), \(x_{ki}\) will not affect the error term \(u_i\)

library(wooldridge)
wage_data <- wage1
# run a regression of log(wage) on educ, exper, and tenure
reg1 <- lm(log(wage) ~ educ + exper + tenure, data = wage_data)
summary(reg1)

Call:
lm(formula = log(wage) ~ educ + exper + tenure, data = wage_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.05802 -0.29645 -0.03265  0.28788  1.42809 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.284360   0.104190   2.729  0.00656 ** 
educ        0.092029   0.007330  12.555  < 2e-16 ***
exper       0.004121   0.001723   2.391  0.01714 *  
tenure      0.022067   0.003094   7.133 3.29e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4409 on 522 degrees of freedom
Multiple R-squared:  0.316, Adjusted R-squared:  0.3121 
F-statistic: 80.39 on 3 and 522 DF,  p-value: < 2.2e-16
  • keeping educ fixed, what is the effect of changing exper and tenure by one year simultaneously?

# keeping educ fixed, what is the effect of changing exper and tenure by one year simultaneously?

reg1$coefficients[3] + reg1$coefficients[4]
     exper 
0.02618833 
  • It returns an object of type Named num
  • Also not easy to read code
  • Let us be complete and verbose
beta_exper <- reg1$coefficients[3] 
beta_tenure <-  reg1$coefficients[4] 
total_effect <- as.numeric(beta_exper + beta_tenure)
total_effect
[1] 0.02618833

Residual and fitted values

  • The fitted/predicted values are given by: \[\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_{1i} + \hat{\beta}_2 x_{2i} + \ldots + \hat{\beta}_k x_{ki}\]
  • The residuals are given by: \[\hat{u}_i = y_i - \hat{y}_i\] \[= y_i - \hat{\beta}_0 - \hat{\beta}_1 x_{1i} - \hat{\beta}_2 x_{2i} - \ldots - \hat{\beta}_k x_{ki}\]
# Fitted values
fitted_values <- reg1$fitted.values
head(fitted_values)
       1        2        3        4        5        6 
1.304921 1.523506 1.304921 1.819802 1.461690 1.970451 
# Residuals
residuals <- reg1$residuals
head(residuals)
          1           2           3           4           5           6 
-0.17351855 -0.34793290 -0.20630834 -0.02804287  0.20601726  0.19860264 
# manual residuals
observed_wage <- log(wage_data$wage)
manual_residuals <- observed_wage - fitted_values
# test if manual residuals are approximately the same as residuals
all.equal(residuals, manual_residuals)
[1] TRUE

Properties of fitted values and residuals

Immediate extension of SLR and implications from the moment conditions:

  • Sample average of residuals is zero: \(\sum_{i=1}^n \hat{u}_i = 0\)
  • Sample covariance between residuals and each of the independent variables is zero: \(\sum_{i=1}^n x_{1i} \hat{u}_i = 0\) \(\sum_{i=1}^n x_{2i} \hat{u}_i = 0\) \(\vdots\) \(\sum_{i=1}^n x_{ki} \hat{u}_i = 0\)
  • The point \((\bar{x}_1, \bar{x}_2, \ldots, \bar{x}_k, \bar{y})\) lies on the regression plane \[\bar{y} = \hat{\beta}_0 + \hat{\beta}_1 \bar{x}_1 + \hat{\beta}_2 \bar{x}_2 + \ldots + \hat{\beta}_k \bar{x}_k\]

Partialing out interpretation

Consider a case with k=2, i.e., two independent variables.

The population model is given by: \[y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + u_i\]

Let us focus on \(\beta_1\)

# run a regression of log(wage) on educ, exper
reg_wage_on_educ_exp <- lm(log(wage) ~ educ + exper, data = wage_data)
summary(reg_wage_on_educ_exp)

Call:
lm(formula = log(wage) ~ educ + exper, data = wage_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.05800 -0.30136 -0.04539  0.30601  1.44425 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.216854   0.108595   1.997   0.0464 *  
educ        0.097936   0.007622  12.848  < 2e-16 ***
exper       0.010347   0.001555   6.653 7.24e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4614 on 523 degrees of freedom
Multiple R-squared:  0.2493,    Adjusted R-squared:  0.2465 
F-statistic: 86.86 on 2 and 523 DF,  p-value: < 2.2e-16

# step 1: run a regression of log(wage) on exper
reg_logwage_on_exp <- lm(log(wage) ~ exper, data = wage_data)
residuals_logwage_on_exp <- reg_logwage_on_exp$residuals
# step 2: run a regression of educ on exper
reg_educ_on_exp <- lm(educ ~ exper, data = wage_data)
residuals_educ_on_exp <- reg_educ_on_exp$residuals
# step 3: run a regression of residuals from step 1 on residuals from step 2
partial_out_reg <- lm(residuals_logwage_on_exp ~ residuals_educ_on_exp)
summary(partial_out_reg)

Call:
lm(formula = residuals_logwage_on_exp ~ residuals_educ_on_exp)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.05800 -0.30136 -0.04539  0.30601  1.44425 

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)           -2.192e-18  2.010e-02    0.00        1    
residuals_educ_on_exp  9.794e-02  7.615e-03   12.86   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.461 on 524 degrees of freedom
Multiple R-squared:  0.2399,    Adjusted R-squared:  0.2385 
F-statistic: 165.4 on 1 and 524 DF,  p-value: < 2.2e-16

What’s going on?

  • The population model is given by: \[y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + u_i\]
  • To get the effect of \(x_{1i}\) on \(y_i\), the MLR is equivalent to the following two-step procedure:
    • Step 1: Regress \(y_i\) on \(x_{2i}\) and obtain the residuals \(\hat{e}_i\)
      • \(\hat{e}_i\) has the variation left in \(y_i\) after partialling out variation from \(x_{2i}\)
    • Step 2: Regress \(x_{1i}\) on \(x_{2i}\) and obtain the residuals \(\hat{v}_i\)
      • \(\hat{v}_i\) has the variation left in \(x_{1i}\) after partialling out variation from \(x_{2i}\)
    • Step 3: Regress \(\hat{e}_i\) on \(\hat{v}_i\)
  • The idea is to partial out the effect of \(x_{2i}\) from \(x_{1i}\) and \(y_i\)
    • This is the same as regressing \(y_i\) on \(x_{1i}\) and \(x_{2i}\)
  • Hence comes the interpretation of \(\beta_1\) as the effect of \(x_{1i}\) on \(y_i\) keeping \(x_{2i}\) fixed
  • There is another way to approach this. See textbook 3-2f and implement it in R

Comparison of Simple and Multiple Regression Estimates

  • The estimated multiple regression can be written as: \[\hat{y_i} = \hat{\beta}_0 + \hat{\beta}_1 x_{1i} + \hat{\beta}_2 x_{2i}\]

  • Suppose we only ran the simple regression of \(y\) on \(x_{1i}\): \[\tilde{y_i} = \tilde{\beta}_0 + \tilde{\beta}_1 x_{1i}\]

  • Simple regression coefficient \(\tilde{\beta}_1\) relates to the multiple regression coefficient \(\hat{\beta}_1\) via: \[\tilde{\beta}_1 = \hat{\beta}_1 + \hat{\beta}_2 \delta_1\] where \(\delta_1\) is the slope from the regression of \(x_{2i}\) on \(x_{1i}\) from estimating: \[x_{2i} = \delta_0 + \delta_1 x_{1i} + \epsilon\]

Key Insight

  1. Confounding term:
    • \(\hat{\beta}_2 \delta_1\): The partial effect of \(x_{2i}\) on \(\hat{y}\) scaled by the relationship between \(x_{2i}\) and \(x_{1i}\).
  2. Two distinct cases where \(\tilde{\beta}_1 = \hat{\beta}_1\):
    • The partial effect of \(x_{2i}\) on \(\hat{y}\) is zero: \(\hat{\beta}_2 = 0\).
    • \(x_{1i}\) and \(x_{2i}\) are uncorrelated: \(\delta_1 = 0\).
  3. Thus generically in a MLR of \(y_i\) on \(x_{1i}, \cdots, x_{ki}\),
    • the OLS estimate on the coefficient on \(x_{1i}\)
    • will be identical to one obtained from a simple regression of \(y_i\) on \(x_{1i}\)
    • iff \(x_{1i}\) and \(x_{2i}, \cdots, x_{ki}\) are uncorrelated.
  4. Example of ability bias in wage regressions

Goodness of Fit

  • The idea of the \(R^2\) statistic in MLR is the same as in SLR
    • The proportion of the total variation in \(y_i\) that is explained by the independent variables in the model \[R^2 = \frac{SSE}{SST} = 1 - \frac{SSR}{SST}\], where:
      • \(SSE = \sum_{i=1}^n (\hat{y}_i - \bar{y})^2\) is the explained sum of squares
      • \(SSR = \sum_{i=1}^n \hat{u}_i^2\) is the sum of squared residuals
      • \(SST = \sum_{i=1}^n (y_i - \bar{y})^2\) is the total sum of squares
  • Always between 0 and 1

Assumptions of MLR

  1. MLR.1: Linearity: The population model \(y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \ldots + \beta_k x_{ki} + u_i\) is linear in the parameters \(\beta_0, \beta_1, \ldots, \beta_k\)
  2. MLR.2: Random Sampling: The data \((y_i, x_{1i}, x_{2i}, \ldots, x_{ki})\) for \(i = 1, 2, \ldots, n\) are a random sample from the population
  3. MLR.3: No Perfect Collinearity: The X’s are not perfectly collinear
  • They are not a linear combination of one another
  1. MLR.4: Zero Conditional Mean: \(E(u_i|x_{1i}, x_{2i}, \ldots, x_{ki}) = 0\)
    • This implies that \(E(u_i) = 0\) and \(E(x_{ji}u_i) = 0\) for all \(j = 1, 2, \ldots, k\)
    • If this holds then the \(x_{ji}\) are called strictly exogeneous

Theorem: Unbiasedness of the OLS estimator in MLR

Under Assumptions MLR.1-MLR.4, the OLS estimator is unbiased. That is, \(E(\hat{\beta}_j) = \beta_j\) for \(j = 0, 1, \ldots, k\).

Variance of the OLS estimator

Homoskedasticity assumption

  • Having an estimate is of no use without a measureof its precision.
  • Similar to SLR, we will make an assumption on the Variance of the error term \(u_i\) conditional on the independent variables \(x_{1i}, x_{2i}, \ldots, x_{ki}\):
  • MLR.5: Homoskedasticity: \(Var(u_i|x_{1i}, x_{2i}, \ldots, x_{ki}) = \sigma^2\)
    • This implies that \(Var(u_i) = \sigma^2\)
  • Using the popoulation model \(y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \ldots + \beta_k x_{ki} + u_i\), we can take conditional variance of both sides to get: \[Var(y_i|x_{1i}, x_{2i}, \ldots, x_{ki}) = Var(u_i|x_{1i}, x_{2i}, \ldots, x_{ki}) = \sigma^2\]
  • Why do we get this? Recall the formula of a variance of sums of objects.
  • Again as before we do not know \(\sigma^2\) but we can estimate it using the residuals similar to SLR.
  • But first lets look at the sampling variance of the OLS estimates

Sampling variance of OLS estimator

Theorem: Sampling variance of OLS estimator in MLR

Under assumptions MLR.1-MLR.5, the sampling variance of the OLS estimate on the \(j^{th}\) coefficient \(\hat{\beta}_j\) is given by: \[Var(\hat{\beta}_j) = \frac{\sigma^2}{(1 - R_j^2)\sum_{i=1}^n (x_{ji} - \bar{x}_j)^2}\] for \(j = 0, 1, \ldots, k\). where:

  • \(\sigma^2\) is the variance of the error term \(u_i\) conditional on the independent variables
  • \(\bar{x}_j\) is the sample mean of \(x_{ji}\)
  • \(\sum_{i=1}^n (x_{ji} - \bar{x}_j)^2\) is the total sample variance of \(x_{ji}\)
  • \(R_j^2\) is the \(R^2\) from regressing \(x_{ji}\) on all other independent variables and an intercept.
  • Observe how each component matters in the formula and why each of them important.
  • Look for connections with MLR.3

The role of \(R_j^2\)

  • \(R_j^2\) is the \(R^2\) from regressing \(x_{ji}\) on all other independent variables and an intercept.
  • What happens to \(Var(\hat{\beta}_j)\) as \(R_j^2\) approaches 1?
  • What does it mean when \(R_j^2\) is very close to 1 versus equal to 1?
    • Multicollinearity VS Perfect Collinearity
      • Definition: Multicollinearity arises when independent variables in a regression model are highly correlated, making it difficult to estimate the effect of each variable accurately.
  • Example: Estimating the effect of different school expenditure categories (e.g., teacher salaries, materials, athletics) on student performance.
  • Challenge:
    • Wealthier schools spend more across all categories, leading to high correlations among variables.
    • Difficult to estimate the partial effect of one category due to lack of variation.
  • Solution: Consider combining correlated variables into a single index.

More on multicollinearity

  • Consider a 3 variable population model: \[y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \beta_3 x_{3i} + u_i\]
    • \(x_{2i}\) and \(x_{3i}\) are highly correlated
    • \(x_{1i}\) is uncorrelated with \(x_{2i}\) and \(x_{3i}\)
  • The OLS estimator will still be unbiased because MLR1-MLR4 are satisfied
  • However, the variance of the OLS estimates of \(\beta_2\) and \(\beta_3\) will be very high
  • This is because the \(R^2\) from
    • regressing \(x_{2i}\) on \(x_{3i}\) will be very high
    • regressing \(x_{3i}\) on \(x_{2i}\) will be very high
  • But the var of \(\hat{\beta}_1\) will be unaffected. Why?
  • Observe that \(Var(\hat{\beta}_1) = \frac{\sigma^2}{(1 - R_1^2)\sum_{i=1}^n (x_{1i} - \bar{x}_1)^2}\)
    • what is \(R_1^2\) based on the information you have?

Example: Final Exam Scores

  • Scenario: Predicting final exam scores using:
    • Key Variable: Number of classes attended.
    • Other control Variables: Cumulative GPA, SAT score, and high school performance.
  • Concern: Other control variables are highly correlated (multicollinear).
  • A useful exercise is to look at the standard errors of the OLS estimates.

Completing the model

  • Now thatw e have laid out the assumptions for unbiasedness, and homoskedasticity for precision
    • We are left to estimate the variance of the error term \(\sigma^2\) to complete the model
  • Simialr to SLR, we can estimate \(\sigma^2\) using the residuals: \[\hat{\sigma}^2 = \frac{1}{n - k - 1} \sum_{i=1}^n \hat{u}_i^2\]
  • The degrees of freedom is \(n - k - 1\) because we have \(n\) observations and \(k+1\) parameters to estimate

Theorem:

Under assumptions MLR.1-MLR.5, \(E(\hat{\sigma}^2) = \sigma^2\)

  • \(\hat{\sigma}\) is the standard error of the regression (SER)

Gauss-Markov Theorem

  • The Gauss-Markov Theorem is a key result in econometrics that states that under the assumptions MLR.1-MLR.5, the OLS estimator is the best linear unbiased estimator (BLUE) of the population parameters.
  • Best: The OLS estimator has the smallest variance among all linear unbiased estimators.
  • Linear: The OLS estimator is a linear function of the dependent variable.
  • Unbiased: The OLS estimator is unbiased.
  • The OLS estimator is the most efficient estimator among all unbiased estimators.

Standard errors of \(\hat{\beta}_j\)

  • Under homoskedasticity, the standard error of the OLS estimate \(\hat{\beta}_j\) is given by: \[SE(\hat{\beta}_j) = \sqrt{Var(\hat{\beta}_j)}\]
  • an estimate of the standard deviation of the sampling distribution of \(\hat{\beta}_j\).
  • a measure of the precision of the OLS estimate \(\hat{\beta}_j\).
  • used to construct confidence intervals and conduct hypothesis tests.
  • estimated using the formula: \[SE(\hat{\beta}_j) = \sqrt{\frac{\hat{\sigma}^2}{(1 - R_j^2)\sum_{i=1}^n (x_{ji} - \bar{x}_j)^2}} = \sqrt{\frac{\hat{\sigma}^2}{(1 - R_j^2)\sqrt{n} \quad sd(x_j)}}\]
  • Notice the role of \(R_j^2\) and \(n\) in the formula.

Ommitted variable bias

Recall

  • Recall the example we did in class with the wage data on educ and exper
  • For the true population model satisfying MLR1-MLR4: \[y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + u_i\]
  • Suppose we omitted \(x_{2i}\) and only ran the simple regression of \(y_i\) on \(x_{1i}\): \[\tilde{y_i} = \tilde{\beta}_0 + \tilde{\beta}_1 x_{1i}\]
  • \(\tilde{\beta}_1\) relates to \(\hat{\beta}_1\) via: \[\tilde{\beta}_1 = \hat{\beta}_1 + \hat{\beta}_2 \delta_1\] where \(\delta_1\) is the slope from the regression of \(x_{2i}\) on \(x_{1i}\) from estimating: \[x_{2i} = \delta_0 + \delta_1 x_{1i} + \epsilon\]
  • So \(E(\tilde{\beta}_1) = \beta_1 + \beta_2 \delta_1\). Bias = \(E(\tilde{\beta}_1) - \beta_1 = \beta_2 \delta_1\)
  • Let us generalize this to a three variable case

Omitted variable bias

For the true population modelsatisfying MLR1-MLR4: \[y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \beta_3 x_{3i} + u_i\] - Suppose \(x_{3i}\) and \(x_{2i}\) are uncorrelated but \(x_{3i}\) and \(x_{1i}\) are correlated - Suppose we omitted \(x_{3i}\) and only ran the regression of \(y_i\) on \(x_{1i}\) and \(x_{2i}\): \[\tilde{y_i} = \tilde{\beta}_0 + \tilde{\beta}_1 x_{1i} + \tilde{\beta}_2 x_{2i}\]

  • Might be tempting to think that \(\tilde{\beta}_2\) is unbiased and only \(\tilde{\beta}_1\) is biased
  • But both will be biased because of the induced correlation between \(x_{1i}\) and \(x_{2i}\) when \(x_{3i}\) is omitted
  • \(\tilde{\beta}_2\) is unbiased only when\(x_{1i}\) and \(x_{2i}\) are uncorrelated. Then we can show that: \[\mathrm{E}\left(\widetilde{\beta}_1\right)=\beta_1+\beta_3 \frac{\sum_{i=1}^n\left(x_{i 1}-\bar{x}_1\right) (x_{i 3} - \bar{x_3})}{\sum_{i=1}^n\left(x_{i 1}-\bar{x}_1\right)^2}\]

Ability bias example

  • The true population model is given by: \[log(wage_i) = \beta_0 + \beta_1 educ_i + \beta_2 exper_i + \beta_3 ability_i + u_i\]
  • Suppose \(exper_i\) and \(ability_i\) are uncorrelated (probably not true)
  • Suppose we omitted the variable \(ability_i\) and only ran the regression of \(log(wage_i)\) on \(educ_i\) and \(exper_i\): \[\tilde{log(wage_i)} = \tilde{\beta}_0 + \tilde{\beta}_1 educ_i + \tilde{\beta}_2 exper_i\]
  • Both \(\tilde{\beta}_1\) and \(\tilde{\beta}_2\) will be biased
    • Even if we assume suppose \(exper_i\) and \(ability_i\) are uncorrelated
  • The bias in \(\tilde{\beta}_2\) is due to the induced correlation between \(educ_i\) and \(exper_i\) when \(ability_i\) is omitted because \(educ_i\) and \(ability_i\) are correlated
  • Confounding the effect of ability. Direction of the bias?
  • So imagine the further complexity if we assumed all variables were pairwise correlated.

Variances in misspecified models

  • We saw what happens to bias. What happens to precision?
  • True model: \(y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + u_i\) and \(x_{1i}\) and \(x_{2i}\) are correlated
  • Estimate two models:
    • Model 1: \(\hat{y_i} = \hat{\beta}_0 + \hat{\beta}_1 x_{1i} + \hat{\beta}_2 x_{2i}\)
    • Model 2: \(\tilde{y_i} = \tilde{\beta}_0 + \tilde{\beta}_1 x_{1i}\)
  1. When \(\beta_2 \neq 0, \widetilde{\beta}_1\) is biased, \(\hat{\beta}_1\) is unbiased, and \(\operatorname{Var}\left(\widetilde{\beta}_1\right)<\operatorname{Var}\left(\hat{\beta}_1\right)\)
  2. When \(\beta_2=0, \widetilde{\beta}_1\) and \(\hat{\beta}_1\) are both unbiased, and \(\operatorname{Var}\left(\widetilde{\beta}_1\right)<\operatorname{Var}\left(\hat{\beta}_1\right)\)
  • This is because (recall the formula for the variance of the OLS estimator):
    • \(\operatorname{Var}\left(\hat{\beta}_1\right)=\sigma^2 /\left[\operatorname{SST}_1\left(1-R_1^2\right)\right]\)
    • \(\operatorname{Var}\left(\widetilde{\beta}_1\right)=\sigma^2 /\left[\operatorname{SST}_1\right]\)
  • Take away: Including irrelevant variables in the model “may” not affect the unbiasedness of the OLS estimator, but it will affect the precision of the estimates.