Introduction to the Data Ferrett

Professor Robert J. Lemke
Department of Economics and Business
Lake Forest College
Fall 2008


The data ferrett is a data portal sponsored by the Bureau of Labor Statistics that allows people to download a vast amount of individual data. Many of you may find one of the many data sets available to you via the data ferrett to be a good source of data for your econometrics project. This page will walk you through the data ferrett as well as getting the data into STATA and manipulating it a bit once you have it there.

You may want to print this webpage before beginning.

  1. To begin, the data ferrett must be loaded on your machine. Go to http://dataferrett.census.gov.

  2. You should now see the Data Ferrett Icon on your desktop. Double-click on it.

  3. You could download all 62 variables, but this is unnecessary and would result in an enormous data set. Instead:

  4. You are now ready to start downloading your data.

  5. The window should change so that you are now given a link to your data set. Right click on the link.

  6. At this point, the data could be read into Excel.

  7. To read your data directly into STATA, open STATA.

  8. What is the difference between marital status (PEMARITL) and the marital status recode (PRMARSTA)? There are a couple of ways to go about looking for the difference, but the first step is to always look at the codebook. Open your codebook in a wordprocessing program (I prefer WordPad), and you will see that the difference concerns some extra categories for spouses. To determine how big of a difference there is between the variables, tabulate both variables: tab pemaritl prmarst. Using this tabulation along with the codebook, we see that the difference is that the recoded variable discerns between a person with a civilian spouse vs. a non-civilian (armed forces) spouse. Suppose we are interested in distinguishing between people who get and stayed married from those who get married and then separate from those who never marry. To do this, we could use either marital status variable. Enter the following commands (and try to predict what each will do to the data set): Notice that we have generated four variables: status takes on one of three values to indicate married, divorced, or single. We also created three dummy variables for married, divorced, and single. With these variables defined, we then dropped the original variables of pemaritl and prmarst. For the record, notice that the first drop command (i.e., drop if pemaritl==.) drops observations with missing data on marital status while the second drop command (i.e., drop pemaritl prmarst) drops variables. You always need to understand if you are dropping variables or dropping observations.

  9. We now want to adjust our variable for sex. Notice that if you tabulate pesex, you can't tell which observations are male and which are female. This is why you must have a codebook. The codebook tells us that males are classified with a 1 while females are classified with a 2. Enter the following commands: Notice that the dummy variables for male and female represent identical information.

  10. Now consider the education variable. According to the codebook, there are five classifications: less than a high school diploma high school graduates with no college, high school graduates with some college, associate degree holders, and bachelor degree holders. Enter the following commands:

  11. If you tab ptdtrace, you will notice that there are a bunch of different categories for race. We will keep things simple and classify people as white, black, native american, asian, or other. To do this, however, we must look to the codebook for guidance. Enter the following commands:

  12. The final variable we want to create is the hourly wage of the respondent. By summarizing the wage variable with detail -- "sum pternh1o, detail" -- one can immediately see that there are a lot of non-respondents to this question (for various reasons). Actually, one immediately sees that there are a lot of negative numbers reported. Checking with the codebook suggests that these are simply people who didn't report an hourly wage. Enter the following commands:

  13. At this point, save your current data set, but change its name so that you don't over-write your original data.

  14. Lastly, we will do some data analysis. Consider the following: