An important question that comes up in determining a curve to fit our data points is: How scattered can the points be and still have a shape that can be represented by a curve? The idea of correlation helps to measure this. When you click "Show Line" in the interactive applet, the value r, which appears in the top left section of the applet, is Pearson's correlation coefficient. It is a
measure of the linear association between the horizontal variable and the vertical variable. It gives information about how tightly packed the data points are about the regression line. It thereby also gives information about how well the regression line fits the data. The r-values can range from -1 (strong negative linear association) to 0 (no linear association) to +1 (strong positive linear association). But beware! You will see below that the correlation coefficient, r, is sometimes misleading. You should always look at the
scatterplot and combine that knowledge with the r-value in order to draw valid conclusions about the strength of the linear association. Go to Questions.
Explore the Relationship Between Correlation and Linear Association
Use the interactive math applet below to help you answer these questions:
1. Compare the r-values for the following three situations.
• Create a scatterplot that you think shows a strong positive linear association between the two variables. What is the r-value
• Create a scatterplot that you think shows a strong negative linear association between the two variables. What is the r-value
• Create a scatterplot that you think shows no linear association between the two variables. What is the r-value?
2. For each r-value below, create a scatterplot that has that exact r-value.
• r = 1
• r = -1
• r =0
3. Plot several points that exhibit a strong positive linear trend, and then plot one outlier.
• Overall, is this scatterplot roughly linear?
• Is the r-value close to 1?
4. In the lower left corner of the coordinate plane, plot 10 points that exhibit no trend (this is sometimes called a "cloud" of points). Then plot one point in the upper right corner.
• Overall, is this scatterplot linear?
• Is the r-value close to 1?
5. Does a high r-value necessarily mean that the data are generally linear? Does an r-value close to zero always mean that the data are not linear?
The moral is that the correlation coefficient, r, is a valuable tool for studying the linear association between two variables, but it does not fully explain the association (in fact, no statistic does).
Notice: use the link below to go the the applet, rather than scrolling down. Go to the Regression
Line Applet
|