Econometrics PS 1
Follow the instructions of completing your problem set detailed in the syllabus.
Question 1
Observe the following population regression models of y_1 on x and y_2 on x where y_{1i} = 2 + 3x_{i} + u_{1i} and y_{2i} = 2 + 3x_{i} + u_{2i}. Observe how the error terms u_{1i} and u_{2i} are generated for 10000 observations.
- Plot the error terms u_{1i} and u_{2i} against x_i. What do you observe?
- Before running the regressions, which OLS estimates will produce unbiased estimates? Prove your answer.
- Use the code above to generate the data y_1, y_2 and x. Then run the two regressions and explain the results.
- Now consider a wage equation: wage_i= β_0+ β_1 education_i + u_i. Suppose unobserved ability is positively correlated with education and affects wages positively. Explain intuitively why E(u | education) \ne 0 in this case. Will OLS overestimate or ̸ underestimate the true return to education? Justify your answer.
Question 2
Consider two researchers studying the same population and estimating the population model y_i = \beta_0 + \beta_1 x_i + u_i with Var(u_i) = 1. Assume SLR.1-4 hold. The two researchers collect different samples:
- Researcher A collects a random sample with 100 observations where it turns out that x_i \in [45, 55] uniformly distributed
- Researcher B collects a random sample with 100 observations where it turns out that x_i \in [0, 100] uniformly distributed
Without doing exact calculations, explain which researcher will likely obtain a more precise estimate of \beta_1.
Question 3
For this problem, you will use the wage2 dataset from the wooldridge package. This dataset contains information on monthly earnings and various characteristics of workers.
Q 3.1
Load the data and create a scatter plot of monthly wages (
wage) against years of education (educ). What do you observe about the relationship? Are there any concerning patterns?Calculate and report the sample correlation between wages and education. Then manually calculate \hat{\beta}_1 using the formula: \hat{\beta}_1 = \hat{\rho}_{xy} \cdot \left(\frac{\hat{\sigma}_y}{\hat{\sigma}_x}\right) by computing each component separately. Show that this matches the OLS estimate from
lm().Estimate the simple linear regression:
wage_i = \beta_0 + \beta_1 \, educ_i + u_i
Report and interpret both coefficients. Is the intercept meaningful in this context?
- Calculate the fitted values and residuals manually (without using
fitted()orresiduals()). Verify that:
- The mean of residuals equals zero
- The point (\bar{educ}, \bar{wage}) lies on the regression line
- The sample covariance between education and residuals is zero
Q 3.2
- Estimate:
\log(wage_i) = \beta_0 + \beta_1 \, educ_i + u_i
- Using the log-level specification:
- Calculate the exact percentage return to one additional year of education
- Predict the percentage wage difference between workers with 12 and 16 years of education
- Create histograms of
wageandlog(wage). Which estimation would you prefer to run: level-level or log-level? Justify your choice.
Q 3.3
Create a residual plot (residuals vs. fitted values) for your log-level specification. Do you see evidence of heteroskedasticity? Explain what pattern you would look for.
Group the data into three education categories: low (educ < 12), medium (12 \leq educ < 16), and high (educ \geq 16).
Calculate the variance of residuals within each group. What do these variances suggest about the homoskedasticity assumption?
- If heteroskedasticity is present, what are the consequences for the unbiasedness of \hat{\beta}_1?