Stata Lab 5: Testing Coefficients

Robert J. Lemke
Department of Economics and Business
Lake Forest College
Copyright 2013

The data for this problem are in Stata format: wages.dta. The data set contains five variables on 704 individuals. The variables are race (1=hispanic, 2=black, 3=white), age, school (years of schooling), sex (F=female, M=male), and annual labor income.

There are 10 questions to the lab. To best learn, try to work through all 10 questions by providing Stata commands and answers. If you get stuck, however, all 10 questions with Stata commands are repeated below. And following that, a Stata program is included that would execute the commands for all 10 questions.

Lab Instructions Without Stata Commands

Describe and summarize the data to better understand the data. Separately tabulate race, school, and sex.
Create dummy variables for race (hispanic, black, and white) and sex (female and male). Also create age2 to equal the square of age, and create lnwage to equal the natural log of annual earnings. Keep (and order) wage, lnwage, age, age2, school, hispanic, black, white, female, and male. Provide the summary statistics for this data set. Save this dataset as wages_edited.dta.
Regress wages on age, age squared, years of schooling, race (omit white), and sex (omit male). Summarize the residuals of the regression. Plot the residuals from the regression against age. Given what you know about wages, do the results generally make sense? Explain why the residuals should make one question the model specification.
Regress logged wages on age, age squared, years of schooling, race (omit white), and sex (omit male). Summarize the residuals of the regression. Plot the residuals from the regression against age. Call this the Base Model. Explain why the residuals might give one more confidence in this model over the model in step 3. Describe the predicted relationship between age and ln(wages) as completely as possible.
Re-estimate the Base Model, and then do the following.
- Test the claim that there is no difference between being black or hispanic on wages.
- Test the claim that there is no effect of race of wages.
- Test the claim that each year of additional schooling increases expected wages by nine percent.
- Test the claim that the gender differential is ten percent.
- Test the claim that the variable age does not belong in the model.
Using the Base Model, again test the claim that the return to each additional year of schooling is nine percent, but this time do not use Stata's test command. Rather, estimate an unrestricted model and a restricted model to obtain the sum of squared residuals. Then calculate the F-statistic and (ball-park or calculate precisely) the p-value.
Using the Base Model, conduct the F-test that all variables have no effect. Do not rely on Stata's output of this statistic, but rather compute it by estimating a restricted and unrestricted model. Then calculate the F-statistic and (ball-park or calculate precisely) the p-value.
Estimate a model wherein ln(wages) depends on age, age-squared, schooling, and sex. Test the claim (using a Chow test) that the coefficients of this model are the same regardless of race.
Estimate a model that includes age, age squared, and sex and allows for a different intercept and a different return for each year of schooling by race, then (1) test the claim that there is no difference in the return to each year of schooling for the three races, and (2) test the claim that there is no difference in the return to each year of schooling for blacks and hispanics.
Estimate a model that allows all coefficients on age, age-squared, schooling, and sex to vary by race. Then test (using Stata's test command) whether the gender differential is statistically different between whites and blacks; between whites and hispanics; between blacks and hispanics; and between all three races.

Lab Instructions With Stata Commands and Answers

Describe and summarize the data to better understand the data. Separately tabulate race, school, and sex.
Create dummy variables for race (hispanic, black, and white) and sex (female and male). Also create age2 to equal the square of age, and create lnwage to equal the natural log of annual earnings. Keep (and order) wage, lnwage, age, age2, school, hispanic, black, white, female, and male. Provide the summary statistics for this data set. Save this dataset as wages_edited.dta.
Regress wages on age, age squared, years of schooling, race (omit white), and sex (omit male). Summarize the residuals of the regression. Plot the residuals from the regression against age.
Given what you know about wages, do the results generally make sense? Explain why the residuals should make one question the model specification.
Regress logged wages on age, age squared, years of schooling, race (omit white), and sex (omit male). Summarize the residuals of the regression. Plot the residuals from the regression against age. Call this the Base Model.
Explain why the residuals might give one more confidence in this model over the previous problem.
Describe the predicted relationship between age and ln(wages) as completely as possible.
Re-estimate the Base Model, and then do the following.
- Test the claim that there is no difference between being black or hispanic on wages.
- Test the claim that there is no effect of race of wages.
- Test the claim that each year of additional schooling increases expected wages by nine percent.
- Test the claim that the gender differential is ten percent.
- Test the claim that the variable age does not belong in the model.
Using the Base Model, again test the claim that the return to each additional year of schooling is nine percent, but this time do not use Stata's test command. Rather, estimate an unrestricted model and a restricted model to obtain the sum of squared residuals. Then calculate the F-statistic and (ball-park or calculate precisely) the p-value.
Using the Base Model, conduct the F-test that all variables have no effect. Do not rely on Stata's output of this statistic, but rather compute it by estimating a restricted and unrestricted model. Then calculate the F-statistic and (ball-park or calculate precisely) the p-value.
Estimate a model wherein ln(wages) depends on age, age-squared, schooling, and sex. Test the claim (using a Chow test) that the coefficients of this model are the same regardless of race.
Estimate a model that includes age, age squared, and sex and allows for a different intercept and a different return for each year of schooling by race, then (1) test the claim that there is no difference in the return to each year of schooling for the three races, and (2) test the claim that there is no difference in the return to each year of schooling for blacks and hispanics.
Alternatively:
Estimate a model that allows all coefficients on age, age-squared, schooling, and sex to vary by race. Then test (using Stata's test command) whether the gender differential is statistically different between whites and blacks; between whites and hispanics; between blacks and hispanics; and between all three races.

gen agew=age*white
gen ageb=age*black
gen ageh=age*hispanic
gen age2w=age2*white
gen age2b=age2*black
gen age2h=age2*hispanic
gen femalew=female*white
gen femaleb=female*black
gen femaleh=female*hispanic
reg lnwage age ageb ageh age2 age2b age2h school schoolb schoolh female femaleb femaleh hispanic black
test femaleb=0
test femaleh=0
test femaleb=femaleh=0

Alternatively:

reg lnwage agew ageb ageh age2w age2b age2h schoolw schoolb schoolh femalew femaleb femaleh hispanic black
test femalew=femaleb
test femalew=femaleh
test femalew=femaleb=femaleh
save wages_edited, replace

Stata Program to Execute All Commands

# delimit;
set more 1;
log using lab5.log, replace;

* STATA LAB FIVE;

use wages;

* Question 1;

desc;
sum;
tab race;
tab school;
tab sex;

* Question 2;

gen hispanic=(race==1);
gen black=(race==2);
gen white=(race==3);
gen female=(sex=="F");
gen male=(sex=="M");
gen age2=age*age;
gen lnwage=ln(wage);
keep wage lnwage age age2 school hispanic black white female male;
order wage lnwage age age2 school hispanic black white female male;
save wages_edited, replace;

*Question 3;

reg wage age age2 school hispanic black female;
predict errors1, resid;
sum errors1;
scatter errors1 age, ti(Wage Regression) saving(e1_age, replace);

* Question 4;

reg lnwage age age2 school hispanic black female;
predict errors2, resid;
scatter errors2 age, ti(Ln(Wage) Regression) saving(e2_age, replace);

* Question 5;

reg lnwage age age2 school hispanic black female;
test hispanic=black;
test hispanic=black=0;
test school=.09;
test female=-0.10;
test age=age2=0;

* Question 6;

gen new_y=lnwage-.09*school;
reg lnwage age age2 school hispanic black female;
gen rssu=_result(4);
gen dendf=_result(5);
gen numdf=1;
reg new_y age age2 hispanic black female;
gen rssr=_result(4);
gen fstat=((rssr-rssu)/numdf)/(rssu/dendf);
gen pval=Ftail(numdf,dendf,fstat);
list fstat pval if _n==1;

* Question 7;

reg lnwage age age2 school hispanic black female;
replace rssu=_result(4);
replace numdf=_result(3);
replace dendf=_result(5);
reg lnwage;
replace rssr=_result(4);
replace fstat=((rssr-rssu)/numdf)/(rssu/dendf);
replace pval=Ftail(numdf,dendf,fstat);
list fstat pval if _n==1;

* Question 8;

reg lnwage age age2 school female;
replace rssr=_result(4);
replace numdf=(3-1)*(_result(3)+1);
replace dendf=_result(1)-3*(_result(3)+1);
reg lnwage age age2 school female if white==1;
gen rssw=_result(4);
reg lnwage age age2 school female if hispanic==1;
gen rssh=_result(4);
reg lnwage age age2 school female if black==1;
gen rssb=_result(4);
replace fstat=( (rssr-rssw-rssh-rssb)/numdf)/((rssw+rssh+rssb)/dendf);
replace pval=Ftail(numdf,dendf,fstat);
list fstat pval if _n==1;

* Question 9;

gen schoolb=school*black;
gen schoolh=school*hispanic;
gen schoolw=school*white;
reg lnwage age age2 school schoolb schoolh hispanic black female;
test schoolh=schoolb=0;
test schoolh=schoolb;

* Alternatively;

reg lnwage age age2 schoolw schoolb schoolh hispanic black female;
test schoolw=schoolh=schoolb;
test schoolh=schoolb;

* Question 10;

gen agew=age*white;
gen ageb=age*black;
gen ageh=age*hispanic;
gen age2w=age2*white;
gen age2b=age2*black;
gen age2h=age2*hispanic;
gen femalew=female*white;
gen femaleb=female*black;
gen femaleh=female*hispanic;
reg lnwage age ageb ageh age2 age2b age2h school schoolb schoolh female femaleb femaleh hispanic black ;
test femaleb=0;
test femaleh=0;
test femaleb=femaleh=0;

* Alternatively;

reg lnwage agew ageb ageh age2w age2b age2h schoolw schoolb schoolh femalew femaleb femaleh hispanic black;
test femalew=femaleb;
test femalew=femaleh;
test femaleb=femaleh;
test femalew=femaleb=femaleh;
save wages_edited, replace;

clear;