Pin it!
Google Plus

The Effects of Outliers

9-12
1
Data Analysis and Probability
Unknown
Location: Unknown

Interactive computer-based tools provide students with the opportunity to easily investigate the relationship between a set of data points and a curve used to fit the data points. As students work with bivariate data in grades 9-12, they will be able to investigate relationships between the variables using. Using interactive tools like the one below, students can investigate the properties of regression lines and correlation.

It is important for students to develop facility at recognizing outliers and appreciating their effect on regression curves and residuals. Using interactive tools, students can investigate the effect of outliers on a regression line and easily see their significance.

In this section, you will see that one point in a data set that is far away from all the others can change the line dramatically.  

appicon Linear Regression I

Instructions: 

  • To add a data point, click on the white area.
  • Hold down shift, and click on a point to drag that point to a new location.
  • In order to remove a point, hold down control and click on the point.
  • Be sure that the circle around the point is showing before you click or drag a point that is already on the graph.
  • The origin is at the center of the grid, but will move if you change the scale.

Investigating the Effects of Outliers on the Regression Line

1. CLEAR the graph. Plot about 8 points that seem to be approximately on a line that has positive slope (slanted up as you move left to right). Click on SHOW GRAPH to see the regression line.
• Does the line fit the points well?
• Does the equation show a positive slope?
2. Add an outlier point, that is, a point that does not at all follow the trend established by the other points.
• Describe what happens to the regression line. Explain.
• Grab this outlier and drag it around. Observe how the regression line changes. Describe any patterns that you see.
3. As you have seen, an outlier can significantly affect the regression line. CLEAR the graph and begin again with about 8 points that seem to be approximately on a line that has positive slope.
• Experiment with dragging an outlier point to find locations of an outlier that cause the regression line to drastically change slope.
• Find other locations of an outlier that cause the regression line to shift without changing slope.

Reflection Questions

  • In the previous part you were asked to think of a real-world example where you would find a line of best fit. In that real situation, what would be an outlier?
  1. How would you summarize the effect of outliers on the regression line?
  2. Think back to the example in part 1 of using the regression equation to predict a person's weight when you know their height. What would be an outlier in this case? Could you justify leaving out that point and using just the remaining points to calculate the regression equation?

pdficonAnswers

References

Copyright Notice: Applet generously provided by: L. O. Cannon, James Dorward, E. Robert Heal, Richard Wellman (Utah State University, www.matti.usu.edu). The USU MATTI project is supported by the National Science Foundation (Award #9819107). Copyright 1999.

  • Computers with internet connection

Extension

Move on to the next lesson, Correlation and the Regression Line.

Questions for Students

Refer to the Instructional Plan.

Unit Icon
Data Analysis and Probability

The Regression Line and Correlation

9-12
Investigate the relationship between a set of data points and a curve used to fit the data points.
LPgeneric
Data Analysis and Probability

The Regression Line

9-12
Interactive computer-based tools provide students with the opportunity to easily investigate the relationship between a set of data points and a curve used to fit the data points. As students work with bivariate data in grades 9-12, they will be able to investigate relationships between the variables using linear, exponential, power, logarithmic, and other functions for curve fitting. Using interactive tools like the one below, students can investigate the properties of regression lines and correlation.
LPgeneric
Data Analysis and Probability

The Centroid and the Regression Line

9-12
Interactive computer-based tools provide students with the opportunity to easily investigate the relationship between a set of data points and a curve used to fit the data points. As students work with bivariate data in grades 9-12, they will be able to investigate relationships between the variables using linear, exponential, power, logarithmic, and other functions for curve fitting. Using interactive tools like the one below, students can investigate the properties of regression lines and correlation.
CorrelationAndTheRegressionLine ICON
Data Analysis and Probability

Correlation and the Regression Line

9-12
Interactive computer-based tools provide students with the opportunity to easily investigate the relationship between a set of data points and a curve used to fit the data points. As students work with bivariate data in grades 9-12, they will be able to investigate relationships between the variables using linear, exponential, power, logarithmic, and other functions for curve fitting. Using interactive tools like the one below, students can investigate the properties of regression lines and correlation.

Learning Objectives

Students will:

  • Investigate the effect of outliers on a regression line and easily see their significance.
 

NCTM Standards and Expectations

  • For bivariate measurement data, be able to display a scatterplot, describe its shape, and determine regression coefficients, regression equations, and correlation coefficients using technological tools.
  • Display and discuss bivariate data where at least one variable is categorical.
  • Recognize how linear transformations of univariate data affect shape, center, and spread.