## The Regression Line

Interactive computer-based tools provide students with the opportunity to easily investigate the relationship between a set of data points and a curve used to fit the data points. As students work with bivariate data in grades 9-12, they will be able to investigate relationships between the variables using linear, exponential, power, logarithmic, and other functions for curve fitting. Using interactive tools like the one below, students can investigate the properties of regression lines and correlation.

In analyzing the relationship between two variables in an
experiment, one may try to fit a straight line or any simple curve to a
plot of the data points. For example, the weight of a person often
depends on their height. Both weight and height are variables. We would
like to find a formula for weight as a function of height in general, a
formula that we can use to predict any person's weight given only their
height. To find such a formula, we
take a sample of 40 (say) people and measure both the height and weight
of each. For each person, we end up with a pair of numbers (*x, y*), where *x*
is the height and *y* is the weight. We plot the 40 height-weight
pairs as points in the *xy*-plane to make what is called a scatterplot.
Note that height is on the horizontal axis and weight is on the vertical
axis. The "input" (independent variable) is height, which goes on the
horizontal axis, and the "output" (dependent variable) is weight, which
goes on the
vertical axis.

We then try to fit a curve to these points that somehow represents the overall shape of the scatterplot and find the equation of that curve. The equation is then used to represent the relationship between height and weight in general and therefore to predict any person's weight if we know only their height.

There are many different kinds of curves one could fit to data. The graphs of linear, exponential, logarithmic, and power functions are all useful curves. In this i-Math, you will investigate the simplest one, the straight line, which is the graph of a linear function.

Plot
points using the regression tool below. The tool will automatically find
a straight line for you that" fits" the points. The line is called the
"least squares regression line" of *y* on *x*. The tool
will also calculate the equation of the line for you and its Pearson
correlation coefficient r, which you will study in part 3. The equation
and the correlation
coefficient are displayed in the top left corner of the tool; n is the
number of points.

**Instructions:**

- To add a data point, click in the white area.
- Hold down shift, and click on a point to drag that point to a new location.
- In order to remove a point, hold down control, and click on the point.
- Be sure that the circle around the point is showing before you click or drag a point that is already on the graph.
- The origin is at the center of the grid but will move if you change the scale.

### Getting to Know the Regression Line

1. Plot one point and then click SHOW LINE. Why do you think a line is not graphed?

2. CLEAR the graph and plot two points that have whole number coordinates.

• On your own paper, find an equation for the line through these two points.

• Click SHOW LINE. Compare the equation for the line drawn to the equation you calculated. Explain and resolve any differences.

• Is it possible for a single straight line to contain all three of the points you plotted?

• On your own paper, sketch a line that you think best fits the three points.

• Click SHOW LINE. Do you think that the line graphed fits the points well? How does it compare to the line you drew?

• Click SHOW LINE to see the "least-squares regression line" that fits these points.

• What do you think will happen to the regression line if you plot a new point? Try it and find out.

(NOTE: When you plot a new point without clearing the graph, then the new regression line is drawn automatically.)

• Plot some more points and see what happens. Describe any patterns or trends that you see.

5. The line that the computer draws is called the least-squares regression line. It "fits" the data points according to criteria that you will learn about later. Roughly, the least-squares regression line is the line that minimizes the squared "errors" between the actual points and points on the line. This makes the line fit the points. Just to get a better feel for the regression line, try the following tasks.

a) Plot 4 points so that the regression line is horizontal. Do this in several different ways.

b) Plot 3 points (not all on a line) so that the regression line is horizontal.

### Reflection Questions

- Can you think of an example of a real situation where finding a line that "fits" the data will be a useful thing to do?
- How do you think that the computer calculates the graph?

### Reference

Copyright Notice: Applet generously provided by: L. O. Cannon, James Dorward, E. Robert Heal, Richard Wellman (Utah State University, www.matti.usu.edu). The USU MATTI project is supported by the National Science Foundation (Award #9819107). Copyright 1999.

- Computers with internet connection

**Extension**

Move on to the next lesson,

The Effects of Outliers.

### The Regression Line and Correlation

### Correlation and the Regression Line

### The Effects of Outliers

### The Centroid and the Regression Line

### Learning Objectives

Students will:

- Investigate the straight line, which is the graph of a linear function.

### NCTM Standards and Expectations

- Understand the meaning of measurement data and categorical data, of univariate and bivariate data, and of the term variable.

- Understand histograms, parallel box plots, and scatterplots and use them to display data.

- Display and discuss bivariate data where at least one variable is categorical.