In analyzing the relationship between two variables in an experiment, one may try to fit a straight line or any simple curve to a plot of the data points. For example, the weight of a person often depends on their height. Both weight and height are variables. We would like to find a formula for weight as a function of height in general, a formula that we can use to predict any person's weight given only their height. To find such a formula, we
take a sample of 40 (say) people and measure both the height and weight of each. For each person, we end up with a pair of numbers (x, y), where x
is the height and y is the weight. We plot the 40 height-weight pairs as points in the xy-plane to make what is called a scatterplot. Note that height is on the horizontal axis and weight is on the vertical axis. The "input" (independent variable) is height, which goes on the horizontal axis, and the "output" (dependent variable) is weight, which goes on the
vertical axis.
We then try to fit a curve to these points that somehow represents the overall shape of the scatterplot and find the equation of that curve. The equation is then used to represent the relationship between height and weight in general and therefore to predict any person's weight if we know only their height.
There are many different kinds of curves one could fit to data. The graphs of linear, exponential, logarithmic, and power functions are all useful curves. In this i-Math, you will investigate the simplest one, the
straight line, which is the graph of a linear function.
Plot points using the regression tool below. The tool will automatically find a straight line for you that" fits" the points. The line is called the "least squares regression line" of y on x. The tool
will also calculate the equation of the line for you and its Pearson correlation coefficient r, which you will study in part 3. The equation and the correlation
coefficient are displayed in the top left corner of the tool; n is the number of points. Go to Questions
Getting to Know the Regression
Line
1. Plot one point and then click SHOW LINE. Why do you think a line is not
graphed?
2. CLEAR the graph and plot two points that have whole number
coordinates.
- • On your own paper, find an equation for the line through these two points.
-
- • Click SHOW LINE. Compare the equation for the line drawn to the equation
you calculated. Explain and resolve any differences.
3. CLEAR the graph and plot 3 points. Think
about a line that "fits" these three points as closely as possible.
• Is it possible for a single straight line to contain all
three of the points you plotted?
• On your own paper, sketch a line that you think best fits
the three points.
• Click SHOW LINE. Do you think that the line graphed fits
the points well? How does it compare to the line you
drew?
4. CLEAR the graph and plot several points.
Think about a line that best fits these points.
• Click SHOW LINE to see the "least-squares regression line"
that fits these points.
• What do you think will happen to the regression line if
you plot a new point? Try it and find out.
(NOTE: When you plot a new point without clearing the graph,
then the new regression line is drawn automatically.)
• Plot some more points and see what happens. Describe any
patterns or trends that you see.
5. The line that the computer draws is called the least-squares regression line. It "fits" the data points according to criteria that you will learn about later. Roughly, the least-squares regression line is the line that minimizes the squared "errors" between the actual points and points on the line. This makes the line fit the points. Just to get a better feel for the regression line, try the following tasks.
a) Plot 4 points so that the regression line is horizontal. Do this in several different ways.
b) Plot 3 points (not all on a line) so that the regression line is horizontal.
Notice: use the link below to go the the applet, rather than scrolling down.
Go to the Regression
Line Applet
|