EXCEL PROJECT #2

Created by Robert J. Lemke
Department of Economics and Business
Lake Forest College
Lake Forest, IL 60045
Copywrite, 2009

Due: Friday, September 18

Answer all of the questions in italics. You can click here and print up all of the questions. Then fill-in your answers as you go along.

  1. A data set has been provided to you as a text file: highschools.txt. If you left click on the link, your web browser may automatically open the file. This isn't ideal, so instead:
    1. Right click on the link.
    2. Save Link As...
    3. Save highschools.txt somewhere on your harddrive or a disk. Don't forget where you have saved it.

  2. Start up Excel. Then open your saved file.

  3. When opening the file, Excel will run its Text Import Wizard. When this box opens up, look at the text in the box and see how your data is listed. If it would be in nice columns, then you would click on Fixed Width. However, this isn't the case here. With this file, items are separated by a common character, specifically a comma, so click: Delimited box => Next => Comma => Next => Finish.

  4. In order to make sure you do not need to repeat this process, save your file as an Excel file called highschools.xls. You will need to change the Save Type to allow for a .xls extension. You should save your work frequently, either by going to File => Save or by typing Cntrl-S.

  5. Excel keeps data in worksheets. The bottom toolbar shows that Excel has named the worksheet highschools. Right click on the worksheet name => Rename => rename the worksheet Original Data.

  6. Check that your new Excel data matches the original data by comparing it to the codebook (codebook_highschools). Specifically, your file should contain 14 variables and 588 observations.

  7. Insert a blank row at the top of your spreadsheet by rightclicking on Row 1 => Insert. Then type the variable names in the cells in row 1 (i.e., cell A1 = schlname, cell B1 = schltype, etc.). Variable names should now be in cells A1:N1, while the data are in cells A2:N589.

  8. Click and drag on the A column, and highlight columns A - N, then change the font to Times New Roman 10 point (on the toolbar).

  9. Right click on column A => Column Width => 35. Repeat for Column C. Similarly, format the width of columns B and D - N to 8.

  10. Left click on column E and drag to highlight E:N. Change the alignment of these 10 columns to be aligned right (on the toolbar). Keep columns A:D aligned to the left.

  11. Highligh cell B2 => View => Freeze Panes.
    You will see a dark horizontal line between rows 1 and 2 and a dark vertical line between columns A and B. Freeze panes will keep the data outside of these lines always in view. Thus, regardless of where you scroll, you will always see the variable names on top (unless you go too wide) and the district names on the left (unless you go too low).

  12. Right click on column E (white) => Format Cells => Number => 1 (decimal) => OK. Every value for white now has 1 value after the decimal point. Repeat the process for psae_s and psae_d. Finally, repeat the process for fundppd, eavppd, texpppd, and pnotkr, but give each value 3 places after the decimal.

  13. How many high schools are in Peoria School District #150?

  14. The previous question was easy to answer, because the data are sorted by district name so you could scroll down and find Peoria easily. A question about a particular high school, however, would be more difficult to answer, because the high schools are not in alphabetical order.
    1. Left click on A1 and drag to N589 (or simply highlight the columns A - N.)
    2. Data => Sort => Sort By (first menu box) => schlname.
      The sort menu allows you to choose up to three variables by which to sort. You can also choose to sort in ascending or descending order.

  15. What school district is Willowbrook High School in?

  16. What high school has the greatest percentage of students meeting or exceeding standards on the PSAE? (Hint: sort the data by psae_s.)

  17. What school district has the greatest percentage of students meeting or exceeding standards on the PSAE? (Hint: sort the data by psae_d.)

  18. What unit school district has the greatest percentage of students meeting or exceeding standards on the PSAE? What high school district has the greatest percentage of students meeting or exceeding standards on the PSAE? (Hint: sort by disttype and then by psae_d.)

  19. In order to manipulate the data further, create a second worksheet in the same Excel file by clicking on the worksheet symbol immediately to the right of your "Original Data" tab at the bottom of your Excel window. Name this new worksheet School Data.

  20. Copy the data from Original Data to School Data.
    1. Click on the Original Data worksheet tab.
    2. Left click on the column heading A and drag to highlight columns A-N.
    3. Press Cntrl-C to copy the highlighted cells.
    4. Click on the School Data worksheet tab.
    5. Click on column A or in cell A1.
    6. Press Cntrl-V to paste the previously copied cells into the new worksheet.
    7. In this worksheet, you only want to keep four variables: white, enroll, psae_s, and psae_d. Highlight the columns of the other variables by left-clicking on the letter(s) at the top. Then right click and select Delete.

  21. In the School Data worksheet, you should now have four variables: white in column A, enroll in column B, psae_s in column C, and psae_d in column D. In cells G1 - J1, type white, enroll, psae_s, and psae_d respectively. In cells F2:F9 type Sum, Num of Obs, Mean, Median, Minimum, Maximum, Sample Variance, and Sample Stnd. Dev.

  22. Widen column F by left clicking on the right side of column F (above row 1) and dragging to the right so the writing does not run into column G. (You can also double click on the right border of the F box at the top of the column.) Be sure that everything is in Times New Roman 10 point, all columns are aligned to the right, and there is just 1 digit after the decimal in columns A - G. (The enroll variable can have 0 or 1 decimal points.)

  23. Put the sum of all of the white values in cell G2 by typing: =SUM(A2:A589). [ G2 Value = 46,318.6 ]
    Excel lets you manipulate data and calculate descriptive statistics by entering formulas and by using functions (such as SUM) that Excel has defined. To indicate that you are doing this, however, Excel requires that all formulas and functions begin with an = sign.

  24. Put the number of non-missing observations for white in G3 (which we know to be 588, but instead of entering it directly, let Excel do it for you) by typing: =COUNT(A2:A589). [ G3 Value = 588 ]

  25. In G2 is the sum of the observations and in G3 is the number of observations. Thus, we can calculate the mean value of white by dividing the value in G2 by the value in G3. We could do this by entering =G2/G3 in cell G4. Rather than entering the formula by hand, however, we can also let Excel do it. In cell G4 enter: =AVERAGE(A2:A589). [ G4 Value = 78.8 ]

  26. Next is the sample median. We know that there are 588 observations in our data set. As this is an even number, the median is the average between the 294th and 295th observations if we would sort our data. Given that the variable name is in row 1, however, the median is the average between the numbers in cells A295 and A296. We could compute this ourselves, but it is better to let Excel do it. In G5 enter: =MEDIAN(A2:A589). [ F6 Value = 95.2 ]

  27. In G6 and G7, put the minimum and maximum values for white. In particular, in G6 enter: =MIN(A2:A589). In G7 enter: =MAX(A2:A589). [ G6 Value = 0, G7 Value = 100 ]

  28. In G8 enter the variance of white: =VAR(A2:A589). [ G8 Value = 976.9 ]

  29. In G9 put the sample standard deviation: =STDEV(A2:A589). [ G9 Value = 31.3 ]

  30. Fill in these 8 descriptive statistics for enroll, psae_s, and psae_d in cells H2 - J9. Instead of typing in all of the formulas again, you can copy and paste what you have already done.
    1. Click on cell G2 and drag through G9.
    2. Press Cntrl-C to copy the highlighted cells. (Actually, you are copying the formulas in these cells.)
    3. Click on H2 - J9.
    4. Press Cntrl-V to paste your previously copied formulas into these cells.
    It is important to note that when you copy and paste a formula, Excel automatically "rolls" the column and row identifiers. For example, if you click on cell J8, you will see in the formula bar that what is entered in Excel is =VAR(D2:D589). In order to prevent this automatic rolling, you need to place a dollar sign, $, in front of the column or row identifier that you do not want to have rolled. We will have an example soon.

  31. What is the mean enrollment for the data? What is the standard deviation of school pass rates for the data set?

  32. Finally, instead of specifying each descriptive statistic separately, we could have used Excel's built-in Descriptive Statistics feature of its Data Analysis Toolpak to produce all of these statistics (and many more) in just a few clicks.
    1. Data => Data Analysis => scroll to Descriptive Statistics => OK.
      If Data Analysis is not in the menu, click on the Microsoft sign in the top-left of your Excel window, then on Excel Options, then on Add-Ins, and then on Data Analysis Toolpak. This will load the toolpak for you.
    2. A Descriptive Statistics menu will appear.
      1. Input Range: A1:D589. (Note: enter A1 and not A2.)
      2. Select Columns.
      3. Select Labels in First Row.
      4. Select New Worksheet.
      5. Select Summary Statistics.
      6. Click on OK.
    3. Upon clicking OK, a new worksheet is created called Sheet 2. Rename the worksheet Descriptive Stats, and move the worksheet to the right of School Data.
    4. Compare the results from Excel's Descriptive Statistics tool to the results in G2 - J9. They should match perfectly.

  33. Back in the School Data worksheet, in cell F11 write WHITE. Make the word WHITE bold and underline using the toolbar. In H12 - J12 enter enroll, psae_s, and psae_d respectively. In cell G13 write Covariance. In cell G14 write Correlation. If necessary, Widen column G so both words are completely visible.

  34. In cell H13, enter the covariance between white and enroll: =COVAR(A2:A589,B2:B589). [ H13 Value = -10,927.1 ].

  35. Are the percent of students who are white and school enrollment positively or negatively related? How do you know?

  36. In cell H14, enter the correlation between white and enroll: =CORREL(A2:A589,B2:B589). [ H13 Value = -0.418 ]. If necessary, change the formating so that 3 decimal points appear on the correlation.

  37. Now we want to fill in the covariance and correlation between white and the two test score variables. We cannot do this simply by cutting and pasting, because Excel will automatically roll the column (and row) identifiers. If we did this, for example, the numbers in cells I13 and I14 would be the covariance and correlation of enroll and psae_s, not the covariance and correlation of white and psae_s. To fix this, return to cell H13. Enter: =COVAR($A2:$A589,B2:B589). In cell H14 enter: =CORREL($A2:$A589,B2:B589). By including the $ before the A in both places in both equations, we are telling Excel to not roll the column identifier for that part of the formula. Do not enter a $ before the B's as we want the B column to roll to C and D when we paste. Now cut and paste cells H13 & H14 into cells I13 - J14. Click on I13 and notice in the formula bar that the formula is now =COVAR(A2:A589,C2:C589). [ I13 Value = 361.6 ].

  38. What is the covariance and correlation between white and psae_d?