Robert J. Lemke
Department of Economics and Business
Lake Forest College
Copyright 2012
The data ferrett is a data portal sponsored by the Bureau of Labor Statistics that allows people to download a vast amount of data. Many of you may find one of the many data sets available to you via the data ferrett to be a good source of data for your econometrics project. This page will walk you through the data ferrett as well as getting the data into STATA and manipulating it a bit once you have it there.
You may want to print this webpage before beginning. By the end of this lab, you will have created and saved a dataset called cps_may_2011_workers.dta which you will use in your first Stata project.
log using cps_may_2011.log, replace
insheet using cps_may_2011.txt
desc
sum
save cps_may_2011_original.dta, replace
keep if occurnum==1
Notice the double equal sign as this is not generating a new variable. Moreover, of the original 134,703 observations, only 53,512 are kept while 81,191 are deleted as they are not associated with the first occurance. Now enter:
drop hrhhid hrhhid2 occurnum yyyymm
Describe the data again, and you will see that we have 6 variables and 53,512 observations.
save cps_may_2011.dta, replace
tab pemaritl prmarst
Using this tabulation along with the codebook, we see that the difference is that the recoded variable discerns between a person with a civilian spouse vs. a non-civilian (armed forces) spouse. Suppose we are interested in distinguishing between people who get and stayed married from those who get married and then separate from those who never marry. To do this, we could use either marital status variable. Enter the following commands (and try to predict what each will do to the data set):
drop if pemaritl==-1
gen status=1*(pemaritl<=3)+2*(pemaritl==4|pemaritl==5)+3*(pemaritl==6)
label define marstats 1 Married 2 Divorced 3 Single
label values status marstats
tab status
gen married=(status==1)
gen divorced=(status==2)
gen single=(status==3)
drop pemaritl prmarsta
Notice that we have generated four variables: status takes on one of three values to indicate married, divorced, or single. We also created three dummy variables for married, divorced, and single. With these variables defined, we then dropped the original variables of pemaritl and prmarst. For the record, notice that the first drop command (i.e., drop if pemaritl==-1) drops observations with "strange" data on marital status while the second drop command (i.e., drop pemaritl prmarst) drops variables. You always need to understand if you are dropping variables or dropping observations.
tab pesex
gen male=(pesex==1)
gen female=(pesex==2)
tab male female
gen sex=male
label define sexes 1 Male 0 Female
label values sex sexes
drop pesex
Notice that the dummy variables for male and female represent identical information.
gen educ=preduca5
recode educ 4=3
recode educ 5=4
label define edclass 1 "No HSD" 2 "HSD" 3 "Some Col" 4 "Col Grad"
label values educ edclass
tab educ
gen nohsd=(educ==1)
gen hsd=(educ==2)
gen somecol=(educ==3)
gen college=(educ==4)
drop preduca5
sum
gen race=1*(ptdtrace==1)+2*(ptdtrace==2)+3*(ptdtrace==3)+4*(ptdtrace==4)+5*(ptdtrace>=5)
tab race
gen white=(race==1)
gen black=(race==2)
gen natam=(race==3)
gen asian=(race==4)
gen othrc=(race==5)
label define races 1 White 2 Black 3 "Nat Am" 4 Asian 5 Other
label values race races
drop ptdtrace
tab race
sum
Notice that you only need quotes around the actual label when there is a space in the label.
sum pternh1o, detail
one can immediately see that there are a lot of non-respondents to this question (for various reasons). Actually, one immediately sees that there are a lot of negative numbers reported. Checking with the codebook suggests that these are simply people who didn't report an hourly wage. Enter the following commands:
gen wage=pternh1o
sum wage, d
keep if wage>0
gen lnwage=ln(wage)
drop pternh1o
order wage lnwage status married single divorced sex male female educ nohsd hsd somecol college race white black natam asian othrc
describe
summarize
save cps_may_2011_workers, replace
tab educ sex, col
tab educ race, col
tab educ status, col
tab race status, row
tab race sex
reg wage divorced single female nohsd somecol college black natam asian othrc
reg lnwage divorced single female nohsd somecol college black natam asian othrc
log close