Stata Project 1: Getting Started

Robert J. Lemke
Department of Economics and Business
Lake Forest College
Copyright 2013


Due: Start of class on Wednesday, February 12

Directions: Everyone must write their own Stata program that produces answers to the following two questions. You can talk to Professor Lemke if you have questions, but do not talk to your classmates! Consider this a take-home exam.

The lab write-up is due on Moodle by the start of class on February 12. When you submit your program on Moodle, also send Professor Lemke an email with three attachments: (1) your program (the same .do file you uploaded to Moodle), (2) a log file from running the program, and (3) your answers in a Word file. Your answers to the questions need to be written extremely clearly without relying on Stata commands or including unnecessary Stata output. A log file will not suffice as your answers. You should use your log file to copy and paste the results you need into the Word file that is your answers.

  1. For the first question, use welfare_edited.dta. We created welfare_edited.dta as a class during the first Stata lab. To guarantee your data is correct, however, you should probably download this version of the dataset.

    1. Create a cross-tab of whether the mother is working against the number of children she has. Consider four groups for the number of children: 1, 2, 3, and 4 or more. Be sure to label any variables you create and label the values of variables that make sense to label so that Stata's output clearly labels everything. Do not include cell, row, or column percentages.
    2. Create a cross-tab of whether the mother is working against the age of the mother's youngest child. Consider three groups for the age of the youngest child (in years): 0 to 2, 3 to 5, and 6 years-old or older. Again, be sure to label any variables and the the values of any variables you create so that Stata's output clearly labels everything. Include in the table the percent of mothers who work (and who do not work) for each of the three groups of ages. For example, the table should report, for all mothers with a youngest child between 0 and 2 years-old, the percent of those mothers who are working as well as the percent of those mothers who are not working.
    3. What percent of mothers are working in each standard metropolitan statistical area?
    4. By education level, what are the average wage and average weekly hours worked of those who are working?

  2. For the second question, use cps_may_2011_workers.dta, which we created together in class while working through the Introduction to the Data Ferrett laboratory exercise. Again, to guarantee your data is correct, you should probably download this version of the dataset.

    1. Create variable labels for each of the variables. Describe and summarize the variables.
    2. What does it mean that the average value for Sex is 0.511? Using a Stata command or two plus some writing, convince someone who doesn't know the dataset that your interpretation is correct.
    3. How many observations in the data set are associated with earning at least $5 and at most $15 per hour?
    4. Using a single Stata command, what is the average hourly wage earned by individuals in the different education classes?
    5. Using the scatter command, plot wages (y-axis) against sex (x-axis). Include a copy of the graph in your results write-up. Explain why your graph is fairly meaningless.
    6. Suppose what is really called for is a bar chart that reports the average wage by sex. Find a graphing command in Stata that produces this graph. (Hint: the command is graph bar, but you will need to use Stata's online help to learn how to fully implement the command.) Include the graph in your results write-up. According to your graph, what is the average hourly wage for women? For men?
    7. Suppose to really impress your boss you decide to produce graphs of the entire distribution of hourly wages by sex. (You may recall that such a graph is typically called a histogram.) Find a graphing command in Stata that produces this graph. (Hint: the command is hist, but you will need to use Stata's online help to learn how to fully implement the command.) Include the graph in your results write-up. What stands out in your graph(s) when considering the distribution of wages across sexes?

    In parts E-G, you are asked to produce graphs. You will receive more points the better your graphs look. In particular, graphs should have titles, appropriately labeled axes and legends, and appropriately numbered axes. Even more detail would be good, such as choosing a good bin size for a histogram, among other things. Extra credit goes to whoever has the best graphs.