Stata Lab 5: Testing Coefficients

Robert J. Lemke
Department of Economics and Business
Lake Forest College
Copyright 2013


The data for this problem are in Stata format: wages.dta. The data set contains five variables on 704 individuals. The variables are race (1=hispanic, 2=black, 3=white), age, school (years of schooling), sex (F=female, M=male), and annual labor income.

There are 10 questions to the lab. To best learn, try to work through all 10 questions by providing Stata commands and answers. If you get stuck, however, all 10 questions with Stata commands are repeated below. And following that, a Stata program is included that would execute the commands for all 10 questions.


Lab Instructions Without Stata Commands

  1. Describe and summarize the data to better understand the data. Separately tabulate race, school, and sex.

  2. Create dummy variables for race (hispanic, black, and white) and sex (female and male). Also create age2 to equal the square of age, and create lnwage to equal the natural log of annual earnings. Keep (and order) wage, lnwage, age, age2, school, hispanic, black, white, female, and male. Provide the summary statistics for this data set. Save this dataset as wages_edited.dta.

  3. Regress wages on age, age squared, years of schooling, race (omit white), and sex (omit male). Summarize the residuals of the regression. Plot the residuals from the regression against age. Given what you know about wages, do the results generally make sense? Explain why the residuals should make one question the model specification.

  4. Regress logged wages on age, age squared, years of schooling, race (omit white), and sex (omit male). Summarize the residuals of the regression. Plot the residuals from the regression against age. Call this the Base Model. Explain why the residuals might give one more confidence in this model over the model in step 3. Describe the predicted relationship between age and ln(wages) as completely as possible.

  5. Re-estimate the Base Model, and then do the following.

  6. Using the Base Model, again test the claim that the return to each additional year of schooling is nine percent, but this time do not use Stata's test command. Rather, estimate an unrestricted model and a restricted model to obtain the sum of squared residuals. Then calculate the F-statistic and (ball-park or calculate precisely) the p-value.

  7. Using the Base Model, conduct the F-test that all variables have no effect. Do not rely on Stata's output of this statistic, but rather compute it by estimating a restricted and unrestricted model. Then calculate the F-statistic and (ball-park or calculate precisely) the p-value.

  8. Estimate a model wherein ln(wages) depends on age, age-squared, schooling, and sex. Test the claim (using a Chow test) that the coefficients of this model are the same regardless of race.

  9. Estimate a model that includes age, age squared, and sex and allows for a different intercept and a different return for each year of schooling by race, then (1) test the claim that there is no difference in the return to each year of schooling for the three races, and (2) test the claim that there is no difference in the return to each year of schooling for blacks and hispanics.

  10. Estimate a model that allows all coefficients on age, age-squared, schooling, and sex to vary by race. Then test (using Stata's test command) whether the gender differential is statistically different between whites and blacks; between whites and hispanics; between blacks and hispanics; and between all three races.



Lab Instructions With Stata Commands and Answers

  1. Describe and summarize the data to better understand the data. Separately tabulate race, school, and sex.

  2. Create dummy variables for race (hispanic, black, and white) and sex (female and male). Also create age2 to equal the square of age, and create lnwage to equal the natural log of annual earnings. Keep (and order) wage, lnwage, age, age2, school, hispanic, black, white, female, and male. Provide the summary statistics for this data set. Save this dataset as wages_edited.dta.

  3. Regress wages on age, age squared, years of schooling, race (omit white), and sex (omit male). Summarize the residuals of the regression. Plot the residuals from the regression against age.

    Given what you know about wages, do the results generally make sense? Explain why the residuals should make one question the model specification.

  4. Regress logged wages on age, age squared, years of schooling, race (omit white), and sex (omit male). Summarize the residuals of the regression. Plot the residuals from the regression against age. Call this the Base Model.

    Explain why the residuals might give one more confidence in this model over the previous problem.

    Describe the predicted relationship between age and ln(wages) as completely as possible.

  5. Re-estimate the Base Model, and then do the following.

  6. Using the Base Model, again test the claim that the return to each additional year of schooling is nine percent, but this time do not use Stata's test command. Rather, estimate an unrestricted model and a restricted model to obtain the sum of squared residuals. Then calculate the F-statistic and (ball-park or calculate precisely) the p-value.

  7. Using the Base Model, conduct the F-test that all variables have no effect. Do not rely on Stata's output of this statistic, but rather compute it by estimating a restricted and unrestricted model. Then calculate the F-statistic and (ball-park or calculate precisely) the p-value.

  8. Estimate a model wherein ln(wages) depends on age, age-squared, schooling, and sex. Test the claim (using a Chow test) that the coefficients of this model are the same regardless of race.

  9. Estimate a model that includes age, age squared, and sex and allows for a different intercept and a different return for each year of schooling by race, then (1) test the claim that there is no difference in the return to each year of schooling for the three races, and (2) test the claim that there is no difference in the return to each year of schooling for blacks and hispanics.

    Alternatively:

  10. Estimate a model that allows all coefficients on age, age-squared, schooling, and sex to vary by race. Then test (using Stata's test command) whether the gender differential is statistically different between whites and blacks; between whites and hispanics; between blacks and hispanics; and between all three races.

Alternatively:




Stata Program to Execute All Commands