This lesson gives students an opportunity to identify an outlier within a set of real-life data.
The context for this investigation involves data for the Los Angeles
Lakers and Detroit Pistons for the 2004‑05 NBA season. However, you may
wish to choose other data sets for use in your classroom. For
geographical reasons, you might want to use teams closer to your
school. To match the interest of students, you may wish to use data
regarding players from a different sport. Or, if your students are
generally uninterested in sports, you may wish to use a completely
different set of data.
Using the Illuminations Line of Best Fit Interactive,
students plot sets of data. Within each of these sets of data, there is
an outlier that students can detect in either of two ways:
- By visual inspection. A scatterplot of the data will show that some points are not aligned with the others.
- By correlation comparison. By removing one player's
data at a time, students can determine which player's data has the
greatest impact on the correlation coefficient.
Students can do a visual inspection using paper-and-pencil
techniques, but repeatedly determining the correlation coefficient on a
set of data when points are removed can be quite cumbersome. However, with the use of
technology (ex: Excel or graphing calculators), the situation to be explored quickly
and the tedium of the calculations is removed. Consequently, it is the
second method that will be covered more thoroughly in this lesson.
Line of Best Fit Interactive
Impact of a Superstar Activity Sheet
Team Data Spreadsheet (Excel)
Begin the lesson by distributing the Impact of a Superstar Activity Sheet. The first page of the activity sheet provides some
background information about the teams. The data to be entered into the
Line of Best Fit Interactive can be found on the second page of the activity sheet. (In addition, all of the data appears in the Team Data Spreadsheet (Excel),
and two columns of data can be pasted from an Excel spreadsheet into
the text box in the Line of Best Fit activity, and when the Update Plot
button is pressed, the points will appear in the scatterplot.)
Following the directions on the activity sheet, students will
first work with the data for the Los Angeles Lakers. They will plot the
data and then determine the line of best fit for this data. In
particular, they will want to take note of the correlation coefficient (r‑value)
for the regression line. Then, one at a time, students will remove one
player's data from the set and determine what effect, if any, the
removal of that player's data has on the line of best fit and
correlation coefficient. [For the Lakers, students will notice that the
correlation coefficient is 0.75 when the data for all players is
considered. However, when the data for Kobe Bryant is removed, the r‑value
increases to 0.95; when the data for any other player is removed, the
correlation coefficient either stays the same or decreases. This
indicates that the data for Kobe Bryant might be an outlier.]
Questions 1‑6 on the activity sheet take students step-by-step
through the process for removing one player's data and considering the
effect on the correlation coefficient. In Question 7, however, students
are left to conduct a similar experiment on their own using data for
the Detroit Pistons. [In this investigation, students will likely
notice that Ben Wallace represents something of an outlier. When the
data of any other player is removed, the r‑value does not change significantly; but when the data for Ben Wallace is removed, the correlation coefficient increases from r = 0.85 to r = 0.97.]
- Collect student work on the Impact of a Superstar Activity Sheet.
- Allow students to analyze another set of data using the Illuminations Line of Best Fit Interactive. Students should be able to identify any outliers and explain how they know.
1. Allow students to combine the data for the Lakers and the Pistons and
consider the complete set. Are either Kobe Bryant or Ben Wallace
outliers in this set? Are both of them still outliers? How do you know?
[The correlation coefficient is 0.77 when all players are included.
When data for both players are removed, the correlation coefficient
increases to 0.96. If either Kobe or Ben are removed, it increases
to 0.88 and 0.83, respectively. It could therefore be said, perhaps,
that the data for Kobe is more of an outlier than the data for Ben.]
Questions for Students
1. Does it appear that the data for any player from either team represents an outlier?
[It appears that the data for Kobe Bryant represents an outlier for the Lakers, and the data for Ben Wallace represents an outlier for the Pistons.]
2. Some sportswriters have accused Kobe Bryant of being a selfish basketball player; that is, they say he tries to score more than he tries to help his team. Do the results of this investigation seem to support that accusation? Given that Ben Wallace is also an outlier, could he be accused of being selfish, too?
[The data suggests that Kobe Bryant scores more points per minute than his teammates. However, it is difficult to determine if that is a result of selfishness. Perhaps he is just a better basketball player. On the other hand, Ben Wallace scores fewer points per minute than his teammates, which would suggest that he is not selfish. The reason he scores fewer points is that he concentrates on rebounding and blocking shots more than scoring.]
- How did technology help students as they attempted to identify
outliers? What things were possible with technology that would have
taken longer (or perhaps been impossible) without technology?
- Were students actively engaged in this lesson? If not, are
there other data sets that could be used that would be more interesting
to your students?