This lesson gives students an opportunity to identify an outlier within a set of real-life data.
The context for this investigation involves data for the Los Angeles Lakers and Detroit Pistons for the 2004‑05 NBA season. However, you may wish to choose other data sets for use in your classroom. For geographical reasons, you might want to use teams closer to your school. To match the interest of students, you may wish to use data regarding players from a different sport. Or, if your students are generally uninterested in sports, you may wish to use a completely different set of data.
Using the Illuminations Line of Best Fit Activity, students plot sets of data. Within each of these sets of data, there is an outlier that students can detect in either of two ways:
- By visual inspection. A scatterplot of the data will show that some points are not aligned with the others.
- By correlation comparison. By removing one player's data at a time, students can determine which player's data has the greatest impact on the correlation coefficient.
Students can do a visual inspection using paper-and-pencil techniques, but repeatedly determining the correlation coefficient on a set of data when points are removed can be quite cumbersome. The use of technology, however, allows for the situation to be explored quickly and removes the tedium of the calculations. Consequently, it is the second method that will be covered more thoroughly in this lesson.
Begin the lesson by distributing the Impact of a Superstar activity sheet. The first page of the activity sheet provides some background information about the teams. The data to be entered into the Line of Best Fit activity can be found on the second page of the activity sheet. (In addition, all of the data appears in the Team Data Spreadsheet (Excel), and two columns of data can be pasted from an Excel spreadsheet into the text box in the Line of Best Fit activity, and when the Update Plot button is pressed, the points will appear in the scatterplot.)
Following the directions on the activity sheet, students will first work with the data for the Los Angeles Lakers. They will plot the data and then determine the line of best fit for this data. In particular, they will want to take note of the correlation coefficient (r‑value) for the regression line. Then, one at a time, students will remove one player's data from the set and determine what effect, if any, the removal of that player's data has on the line of best fit and correlation coefficient. [For the Lakers, students will notice
that the correlation coefficient is 0.75 when the data for all players is considered. However, when the data for Kobe Bryant is removed, the r‑value increases to 0.95; when the data for any other player is removed, the correlation coefficient either stays the same or decreases. This indicates that the data for Kobe Bryant might be an outlier.]
Questions 1‑6 on the activity sheet take students step-by-step through the process for removing one player's data and considering the effect on the correlation coefficient. In Question 7, however, students are left to conduct a similar experiment on their own using data for the Detroit Pistons. [In this investigation, students will likely notice that Ben Wallace represents something of an outlier. When the data of any other player is removed, the r‑value does not change significantly; but when the data for Ben Wallace is removed, the correlation coefficient increases from r = 0.85 to r = 0.97.]