## Correlation and the Regression Line

Interactive computer-based tools provide students with the opportunity to easily investigate the relationship between a set of data points and a curve used to fit the data points. As students work with bivariate data in grades 9-12, they will be able to investigate relationships between the variables using linear, exponential, power, logarithmic, and other functions for curve fitting. Using interactive tools like the one below, students can investigate the properties of regression lines and correlation.

An important question that comes up in determining a curve to fit our data points is: How scattered can the points be and still have a shape that can be represented by a curve? The idea of correlation helps to measure this.

Have students go to the Linear Regression I Activity and click on the Instructions tab. Allow students time to read the instructions on using applet.

Point out that we will be focusing on the *r*-value, which is a measure of the linear association between the horizontal variable and the vertical variable. It gives information about how tightly packed the data points are about the regression line. It thereby also gives information about how well the regression line fits the data. The *r*-values can range from -1 (strong negative linear association) to 0 (no linear association) to +1 (strong positive linear association). But beware! You will see below that the correlation coefficient, *r*, is sometimes misleading. You should always look at the scatterplot and combine that knowledge with the *r*-value in order to draw valid conclusions about the strength of the linear association.

**Exploring the Relationship Between Correlation and Linear Association**

Use the interactive math applet in the Linear Regression I Activity to help you answer these questions:

1. Compare the *r*-values for the following three situations.

- Create a scatterplot that you think shows a strong positive linear association between the two variables. What is the
*r*-value.

[The points should lie roughly on a straight line that slopes upward to the right. The value of*r*should be 1 or close to 1.] - Create a scatterplot that you think shows a strong negative linear association between the two variables. What is the
*r*-value?

[The points should lie roughly on a straight line that slopes downward to the right. The value of*r*should be -1 or close to -1.] - Create a scatterplot that you think shows no linear association between the two variables. What is the
*r*-value?[The points could be scattered all over the grid with no pattern at all. The value of

*r*should be zero or close to zero. Or the points could be in a pattern that has a strong shape other than linear. For example, a circle or a strong curve, which could have an*r*value close to zero.]

2. For each *r-*value below, create a scatterplot that has that exact *r*-value.

*r*= 1

[Points exactly on a straight line sloping upward to the right.]*r*= -1

[Points exactly on a straight line sloping downward to the right.]*r*=0[Answers will vary.]

3. Plot several points that exhibit a strong positive linear trend, and then plot one outlier.

- Overall, is this scatterplot roughly linear?

[If the outlier is close to the other points, it may still look roughly linear. If the outlier is far away, the line will be pulled away from the original points.] - Is the
*r*-value close to 1?

[The*r*value will be close to 1 if the outlier is close to the original points, but will be farther away from 1 if the outlier is farther away.

4. In the lower left corner of the coordinate plane, plot 10 points that exhibit no trend (this is sometimes called a "cloud" of points). Then plot one point in the upper right corner.

- Overall, is this scatterplot linear?

[If the 10 points are closely packed in the very bottom left corner and the outlier is in the far right top corner, then the scatterplot is not linear.] - Is the
*r*-value close to 1?

[Yes.]

5. Does a high *r*-value necessarily mean that the data are generally linear? Does an *r*-value close to zero always mean that the data are not linear?

[The result of question (4) shows that a high

rvalue does not necessarily go with a linear trend.]

The moral is that the correlation coefficient, *r*, is a valuable tool for studying the linear association between two variables, but it does not fully explain the association (in fact, no statistic does).

### Reference

Copyright Notice: Applet generously provided by: L. O. Cannon, James Dorward, E. Robert Heal, Richard Wellman (Utah State University, www.matti.usu.edu). The USU MATTI project is supported by the National Science Foundation (Award #9819107). Copyright 1999.

- Computers with internet connection
- Answers

**Assessment Options**

- Use the check points to facilitate ongoing assessment. Later, determine some number of points for each section.
- Assign a journal writing task in which students describe the difference in a positive correlation, negative correlation, and no correlation.

**Extensions**

- Show students a set of points and have them identify the type of correlation and roughly what the
*r*value would be. - Have students create a scatterplot they feel would provide the same
*r*value they chose for the above question. Then return to your graph and have them compare. Are they similar? Are they different? How? - Move on to the last lesson,
*The Centroid and the Regression Line*.

**Questions ****for Students**

1. What must a scatterplot look like to produce an *r *value equal to 1?

[Points must be in an exact straight line sloping upward to the right.]

2. What must a scatterplot look like to produce an *r *value equal to -1?

[Points must be in an exact straight line sloping downward to the right.]

3. What must a scatterplot look like to produce an *r *value equal to 0?

[Answers will vary.]

**Teacher Reflection**

- Did you find it necessary to make adjustments while teaching the lesson? If so, what adjustments, and were those adjustments effective?
- What, if any, issues arose with classroom management? How did you correct them? If you use this lesson in the future, what could you do to prevent these problems?
- What were some of the ways that the students illustrated that they were actively engaged in the learning process?

### The Regression Line and Correlation

### The Regression Line

### The Effects of Outliers

### The Centroid and the Regression Line

### Learning Objectives

Students will:

- Learn about Pearson's correlation coefficient: the measure of the linear association between the horizontal variable and the vertical variable.

### NCTM Standards and Expectations

- For bivariate measurement data, be able to display a scatterplot, describe its shape, and determine regression coefficients, regression equations, and correlation coefficients using technological tools.

- Display and discuss bivariate data where at least one variable is categorical.

- Recognize how linear transformations of univariate data affect shape, center, and spread.